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This thesis examines a number of underwater acoustic signals and the problem of classifying 
these signals using a back-propagation neural network. The neural network classifies the signals 
based upon features extracted from the original signals. The effect on classification by using an 
adaptive line enhancer for noise reduction is explored. Two feature extraction methods have been 
implemented; modeling by an autoregressive technique using the reduced-rank covariance method, 
and the discrete wavelet transformation. Both orthonormal and non-orthonormal transforms are 
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I. INTRODUCTION 


A. SCOPE OF STUDY 


The United States Navy’s undersea acoustic surveillance systems were used during the 
Cold War to detect and track enemy submarine activities. These systems consist of both fixed and 
mobile hydrophone arrays. Many aspects of these arrays are now declassified and are being used 
by geophysicists to monitor undersea earthquakes and volcanoes. The undersea SOund 
Surveillance System (SOSUS), array out of Whidbey Island Washington is of interest to this 
thesis. It consists of fixed hydrophones mounted at the approximate depth of the deep sound 
channel. These hydrophones are very Sensitive and have the ability to pick up very low frequency 
sounds.[13 p. 3] 

Earthquakes have associated with them three phases, representing three separate types of 
energy release. First is the primary phase, or p-phase. This phase consists of energy that is 
transmitted as compressed shock waves within the earth’s crust. The secondary, or s-phase, 
consists of transverse shock waves also being transmitted through the earth’s crust. The last 
phase, the tertiary or t-phase, is only associated with underwater earthquakes. The t-phase consists 
of acoustic energy that is transmitted into the water column from the shaking ocean floor. The 
definition has been expanded recently to include any low-frequency seismic event that transmits 
acoustic energy into the water column.[13 p. 4] 

The National Oceanic and Atmospheric Administration (NOAA), Pacific Marine 
Environmental Laboratory is conducting a study of underwater geological processes, such as 
earthquakes and volcanic activities using the Whidbey Island SOSUS array. This undersea 
Surveillance system also has great potential for use in tracking and estimating populations of 
marine mammals. The purpose of this thesis is to investigate classification procedures which 
would allow for proper separation of geological processes and biological signals, and would 
classify the various biological signals for further biological studies. A Neural Network (NN) 
configuration is chosen in this study to automate the classification process. NN have great 
potentials in automatic classification problems, as they can learn through examples and do not 
require a precise mathematical model for the signal characteristics. However, NN implementations 
require the set of training data to be representative of the various signals to be classified to have 


good performance. Thus, the success of any classification scheme depends on a judicious choice of 


unique features that allows the classifier to discriminate between each class of interest. This thesis 
explores the use of autoregressive (AR) modeling and wavelet decomposition as feature extraction 
techniques. In addition, we investigate the application of an adaptive line enhancer to de-noise the 
underwater data. The AR coefficients used as NN inputs are computed using a reduced-rank 
covariance method. This technique combines traditional covariance method and the singular value 
decomposition to reduce the effect of noise in the signal model. Next, orthogonal and non- 
orthogonal implementations of the discrete wavelet transform are chosen as feature extractors. The 
orthogonal decomposition uses two different wavelet bases; Symmlet 8, and Coiflet 3. coefficients. 
The A-Trous implementation of the non-orthogonal decomposition uses a modified Morlet wavelet. 
This classification study uses 5 different species of whale (Sperm, Killer, Humpback, Gray, and 
Pilot Whale) and underwater earthquake data. 

Chapter IT describes the methodology used in the signal selection. Chapter III presents the 
analytical methods used to compute the input parameters to the Neural Network implementation. 
In this chapter, we first describe the adaptive line enhancer procedure designed to reduce the effects 
to wideband noise contained in the recordings. Next, we present the reduced-rank covariance AR 
modeling technique. Finally, we briefly review multiresolution algorithms and present their 
application to our classification problem. Chapter IV describes the back-propagation neural 
network implementation used during our study. Results are presented in Chapter V. Last, Chapter 


VI presents conclusions and recommendations for further study. 


ne 


II. SIGNAL SELECTION 

The operating environment of the undersea sonar array is in the Pacific Northwest, off the 
coast of Whidbey Island Washington. Researchers familiar with the array data have indicated the 
presence of marine mammal sounds in the recordings, as this array is located in a whale migration 
route. Thus, this system has a high potential for use in marine mammal studies. In order to separate 
underwater seismic data from marine mammal sounds, we selected recordings from a cross section of 
whale species that represent the population in the array system area during the yearly migration 
periods. The frequency characteristics of the various whale species cover the entire frequency 
spectrum. The biological signals range from very short pulses to long melodic songs. In addition to 
the underwater earthquake used, the types of biological signals chosen for the study are: Sperm 
Whale, Killer Whale, Humpback Whale, Gray Whale, and Pilot Whale. 

Note that even though the array data indicates the presence of some species of whale 
sounds, the biological and earthquake signals used were obtained from a different parce eA eee 
decision not to use biological cuts obtained from the system array data was motivated by the fact 
that we were unable to obtain proper identification of the various species of whales by a specialist. 
Thus, in order to minimize any problems due to incorrect training we used cuts from a commercial 
audio cassette in which the whale species have been properly identified. Each recording varied in 
length from fifteen to thirty seconds. The recordings were of real, open ocean encounters from 
various signal collection platforms. The signals, as an artifact of how they were collected, were all 
corrupted with background noises. The corruption of the signals includes sounds from ships, small 
boats, and other disturbances occurring in the natural environment, plus artificial noise from the 
means of collection. 

Each signal was digitized on a 486 PC using a Media Vision Pro Audio Spectrum sound card 
at 8 KHz, using a single channel and 8 bits per sample. This ensured audio compatibility with the Sun 
workstation’s audio playback which is limited to single channel and 8 bit, 8 kHz data. The digitized 
recordings were then processed to extract, as close as possible, the whale signals from the recordings, 
using audio, and visualizations of the time and frequency domains. The whale ‘songs’ were then 
selected and separated from the various other auditory manifestations of whale sounds. Next, these 
sample songs were pasted together to make a nearly continuous song for each species. The sperm 
whale signal was of a different nature. The recording is of the animals” echo ranging sonar. The very 
short, rapid pulses were all kept intact. The earthquake recording was also kept intact. The pilot 


whale sound produced a challenge as it was very short in duration, has high pitch, and was buried in 


noise. It was especially difficult to separate the background noise from the pilot whale signal. Figure 


2-1 presents typical time domain-plots from each class of signal studied. 
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Figure 2-1 Time domain signals of: a) sperm whale, b) killer whale, c) humpback whale, d) gray 
whale, e) pilot whale, and f) earthquake. 


In this chapter we have identified the signals that we are to classify. Next in Chapter III, 
we introduce the methods of feature extraction used in this study. Specifically we present the 
reduced rank AR modeling technique, the discrete wavelet transform producing wavelet based 
parameters, and the adaptive line enhancer to reduce the effects of wide-band noise contained in the 


signals. 


WI. FEATURE EXTRACTION OF BIOLOGICAL AND SEISMIC 
SIGNALS : 


A. INTRODUCTION 

The biological and seismic signals used in this study consist of data records that typically 
exceed 40,000 data points, which precludes any realistic method for classification based on the 
data directly. Therefore signal characteristics need to be expressed in ways that retain the signals’ 
unique features, yet drastically reduce the amount of data needed as inputs to the classifier. 
Feature extraction methods are a means of modeling a signal based on some specific property of 
the signa]. These features are then used to classify the signal. 

One of the most distinguishing features of an acoustic signal is its spectral content. The 
signals investigated in this study are generated by resonating cavities, vibrating vocal cords, and in 
the case of seismic signals, the vibration of rock formations rubbing against each other. As a 
result, most of the energy contained in the signals under study is concentrated in a few frequency 
components. Note that an exception to this is the sperm whale data, which contains a very wide 
spectral content. As a result, spectral information can be used to distinguish between the different 
classes of signals. Two different procedures are considered in the study to extract frequency based 
information. First we apply a reduced-rank AR modeling technique to compute AR coefficients 
which are used as inputs to the neural network classifier. Next, we use the discrete wavelet 
transform to compute wavelet-based parameters which are used as inputs to the NN. In addition, 
we investigate the application of an adaptive line enhancer (ALE) algorithm as a pre-processing 
Step to reduce the effects due to wide-band noise contained in the data. This chapter presents the 
feature extraction methods used in this study. The first part of this chapter presents the ALE noise 
reduction procedure. Next, we present the reduced-rank autoregressive modeling. The discrete 


wavelet transformation and its application to our study is presented last. 


B. NOISE REDUCTION USING ADAPTIVE LINE ENHANCEMENT 
1. Introduction 
This study investigates the application of an Adaptive Line Enhancement (ALE) algorithm 
to decrease the effect due to wideband noise present in the signals under study. The ALE algorithm 
is a gradient descent, adaptive filter based on the least mean square (LMS) algorithm. The 
algorithm tracks narrowband frequencies contained in the input signal, and removes wideband 
components that are uncorrelated to the frequencies present. First, we briefly review the concept of 
the LMS algorithm, and next we present its application in the ALE algorithm. 
2. Least Mean Square Algorithm 
The motivation for the least mean square algorithm is to design a filter that learns from its 
environment and converges to the optimum weight values given by the Wiener-Hopf equations in a 
stationary environment. The LMS algorithm is a stochastic gradient-based algorithm which is 
simple to implement and is effective in its ability to adapt to the external environment. 
a. Overview Of The Structure And Operation of the LMS Algorithm 
The system building blocks for the LMS algorithm are depicted in Figure 3-1 
below. A similar block diagram will be shown later for the ALE implementation. The input signal 
is fed into an adaptive FIR filter, and the output of the filter, y(n), is compared to the desired 
signal, d(n), forming the error e(n). The second important feature of this scheme is the adaptive 
control process where the filter weights are updated based on the direction needed to minimize the 


mean square error. The filter weights are updated in accordance with: 
w(n+ 1) = win) += MEV}, G.1) 


where w(n) is the filter weight vector, W is the step size parameter, and VJ(n) is the gradient of the 
mean square error function. The negative gradient of the mean square error function , 


-VJ(n), is approximated by the instantaneous gradient expression: 
-VJ(n) = 2u(n)-e (n), (3.2) 


where u(n)=[u(n), u(n-1), ... .4(n-P+1)]', and P is the length of the filter. 






Filter y(n] 


Input u[n] : 


Adaptive 
Control 





Figure 3-1 The LMS Algorithm. 


Thus the LMS estimated weight vector is obtained using the instantaneous gradient and is given 
by: 


w(n +1) = w(n) + pe" (n)u(n). (3.3) 
The LMS weight coefficients are given by: 
w.(n+l)=w,(n)+ pa(n—k)e"(n). kk =0,1,...,P-1 (3.4) 


The step size parameter, p1, 1s defined as a positive real constant which controls the size of 


the incremental weight correction, and is bound by: 


<r (3.5) 


P-R,(0)’ 

to insure that the algorithm converges. Note that the LMS weights exhibit a random motion 
around the optimal solution due to of the instantaneous approximation of the gradient, also known 
as gradient noise.{3, p. 300} 

The LMS algorithm is implemented as follows[3, p. 332]: 


1) At time n=0, initialize the weight vector to an estimate of w(0) 


2) y(n)= w” (n)u(n) 


3) e(n) = d(n)-y(n) 

4) w(n+1) = w (n) +pu(n)e (n) 
where the parameters of the algorithm to be chosen by the user are the length of the filter, P, and 
the step size parameter, [L. 

3. Adaptive Line Enhancement Using the LMS Algorithm 

The adaptive line enhancer using the LMS algorithm is a logical choice to reduce the 
wideband noise inherent in the sampled recordings. The motivation behind using the LMS 
algorithm is that it is simple to implement, and has been historically proven to achieve satisfactory 
performance. When used in the right conditions, the results can be of high quality. 

The signals under investigation vary widely in nature from very localized in time (broad 
frequency range), in the case of the sperm whale, to very localized in frequency (slowly changing in 
time), in case of the humpback whale. Thus the step size was chosen to be 0.05 percent of the total 
signal power as a result of a trial and error process. 

The number of filter weights was chosen to decrease the total number of frequency related 
components contained in the signals, and yet preserve enough to be able to discriminate between 
each class of signal. This study uses a filter length of 10 to extract the five most dominant 
sinusoids in each signal class. The figure below represents the implementation of the ALE 


algorithm to estimate the signals present in the biologic and seismic data. 


Input Signal u[n] 





Figure 3-2 The Adaptive Line Enhancer. 


The delay A, shifts the signal in time causing the broadband noise contained in u(n-A) to 
become uncorrelated with the broadband noise contained in the reference signal d(n) = u(n). By 
experimentation we found that a delay of one sample was sufficient to decrease the contribution of 


broadband noise without distorting the sperm whale data. 


The output of the filter reflects only the signal that is correlated to the desired signal, 
provided the filter is operating optimally. The MATLAB® program used in this study 
LMSALE.M was written by LT D. Brown, USN [14] and is included in the appendix. The 
following spectrograms represent the signals before and after applying the ALE algorithm. Note 
that the results reflect the trade off in step size, filter length, and delay for all classes. This process 
as implemented greatly improved the gray whale data, and had varying degrees of success for the 
other classes. 

Each of the following spectrograms was obtained using the MathWorks function 
SPECGRAM.M. A Hanning window of size 512 with SO % overlap is used and the FFT length 
chosen is 512. Note that the spectrograms are given in terms of normalized frequency, where half 
sampling frequency is represented by 0.5 and the time axis is expressed in terms of number of 
samples. The spectral information is mapped to color values. The range of color values is red, 
orange, yellow, green and blue. High intensity spectra appear as red. Low intensity spectra appear 


as blue. 





Spectrogram of Gray Whale Data 
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Spectrogram of ALE Gray Whale Data 
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Figure 3-3 Spectrogram of gray whale data; frequency (fs/2 
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Figure 3-4 Spectrogram of gray whale data after processing with ALE; normalized frequency 
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Figure 3-5 Spectrogram of humpback whale data; normalized frequency (fs/2 = 0.5), normalized 
time expressed in number of samples. 
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Figure 3-6 Spectrogram of humpback whale data after processing with ALE; normalized frequency 
(fs/2 = 0.5), normalized time expressed in number of samples. 
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Figure 3-7 Spectrogram of pilot whale data; normalized frequency (fs/2 
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Figure 3. -8 Spectrogram of pilot whale data after processing with ALE; normalized frequency 
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Figure 3-9 Spectrogram of earthquake data; normalized frequency (fs/2 = 0.5), normalized time 
expressed in number of samples. 
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Figure 3-10 Spectrogram of earthquake data after processing with ALE; normalized frequency 


(fs/2 = 0.5), normalized time expressed in number of samples. 
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Figure 3-11 Spectrogram of killer whale data; normalized frequency (fs/2 


expressed in number of samples. 
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Figure 3-12 Spectrogram of killer whale data after processing with ALE; normalized frequency 
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Spectrogram of Sperm Whale Data 
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Figure 3-13 Spectrogram of sperm whale data; frequency (fs/2 =0.5), normalized time expressed 
number of samples. 
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C. REDUCED-RANK AUTOREGRESSIVE MODELING 


1. Introduction 

The various classes of signals under investigation in this study exhibit differences in their 
spectra, therefore spectral information is used for classification purposes. The autoregressive (AR) 
coefficients of the filter used to model the original signal are the characterizing parameters chosen 
as input parameters because they represent the spectra of the signals. They are used to classify the 
various classes of signals under study. 

In this section, we first introduce the concept of autoregressive modeling and the 
covariance method used in this study. Next, we consider the problem of selecting the model order. 
Finally, we present the application of reduced-rank modeling to the covariance method used to 
decrease the effect of noise in the data. 

2. Autoregressive Modeling 

Autoregressive (AR) modeling is based on the idea that an original signal x(n) can be 
expressed as the output of an all-pole linear shift invariant predictive filter driven by white noise. 
In the time domain, this means that the signal x(n) may be expressed as a linear combination of 


previous values x(n-i), i= 1,2, ... P, and some input noise sequence w(n). 
P 
x(n) =—¥ a(k)x(n —k) + b,w(n), (3.6) 
k=] 


where P is the order of the predictor, bo is the gain, and (a(1), ... , a(P)) are the coefficients of the 
linear predictor to be determined. Taking the Z-transform of (3.6), the resulting transfer function 


of the system used to generate x(n) from w(n) is given by: 


= X(2) _ b, - a 
= = —____*_____= (3.7) 
WACO IMG ie ser eat A(z) 





H(z) 


The AR coefficients can be obtained by solving a set of linear equations obtained from Equation 
(3.6). Using the properties of the AR model, the correlation function R,(/) obtained from x(n) is 
given by: 

R, (I) =a, R, (I=) ap, (I- P) + BR, ), 
which leads to: 


Ra) aha = eeetae Re (rrp eRe n(n): (3.8) 
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The cross correlation term R,,(/) can be expressed as the convolution of the impulse response h(n) 
of the AR system with the autocorrelation of the noise sequence R,(n) as: 


R,, (I) =h()*R, (I), 


where R,()=076(), 
which leads to Rev oe on) 
Thus, 
R,@=6.h ©). (3.9) 


By substituting (3.9) into (3.8), the correlation difference equation becomes: 
R,()+4,R,(1-1)+--+apR,(1— P) =b,o%h (-l). (3.10) 
Note that h(n) is the impulse response of a causal filter, therefore h(n) is equal to 0 for n< 0. 


Next, using the Initial Value Theorem we have: 


b 
A(0)=lim H@)= lim———-—— =. Ga) 
Zee zoe 1+a,Z ee ae 


Therefore, 
RG. cametor! = 0 
ee eter. 


Thus, Equation (3.10) becomes: 


R,.(1) +a,R,(1-1)+---+a,pR,(1- P) =b,0.,8(), for12=0. (3.12) 


Expressing Equations (3.12) for 1 = 0,..., P, leads to the extended set of Yule-Walker equations 
(1, p.414] 


RO) R(-l) + RCP) Tf 7] fo2tage 
RO R_(0 -- R(-P+1 

1 LO Lee no _ 
R(P) R(P-1) - — -R,) Jha, 0 


The set of AR coefficients of the filter a = {1, Bee can be obtained by solving the set 


of linear equations derived from Equation (3.13): 


7 


R, (0) ie Ge tal) ||| 2 R, (+1) 


R.(1 ROW R= P +2) | | a, Rae 
me +2) 2, |_| Ce) Tae 
RP eRe 2 es eRe) ap R.(+P) 
he a = d. 


Note that in practical situations the true correlation matrix is usually not known and has to be 
estimated from the observed data. Various estimation procedures have been considered in [1]. In 
this study the covariance method is used to estimate the correlation matrix because no assumptions 
are made about the data beyond the length of the prediction filter. Note that this is different from 
the autocorrelation method where the data is zero padded. 

3. The Covariance Method 

In modeling real signals where the true first and second order statistics are not known, the 
correlation structure has to be estimated from the observed data, and used to solve for the AR 
coefficients and the gain term bp *. The covariance method uses the following equation to estimate 


correlation lags: 
RO=— F "(n ~1)x(n) 3.15) 
=—— 9 x (n—/)x(n). : 
‘ N —P nap ( ( 


The gain term b’o o°, can then be estimated from the estimated a coefficients by: 





b, 





Pp A 
*o2 = Valk): Re(k), where dp =1. (3.16) 
k=0 


The estimated correlation matrix resulting from Equation (3.15) is hermitian (R, (ky= a (—k)). 
Note that it is singular if the data consists of P-1 or fewer complex sinusoids. However any noise 
in the observed data will cause the matrix to become non-singular. A noted drawback to the 
covariance method is that the resulting estimated pole locations are not guaranteed to be inside the 
unit circle. [2, p. 223} Hence the AR filter is not guaranteed to be stable. However this noted 
drawback does not prevent using the AR parameters to characterize the signal properties. In 
addition, note that the strength of the covariance method over the autocorrelation method is that if 
the signal to be modeled is of pure sinusoids, the covariance method can be used to perfectly model 
the frequencies. This property is not shared by the autocorrelation method.[2 p. 223] 

The main frequencies, z,, obtained by the AR modeling procedure can be estimated as the 


roots of the polynomial: 
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A(z) =1+4,27'+---tapz”. (3.17) 
An estimate of the spectrum can be obtained from the recursive AR model: 


x(n) + a,x(n —1)4+---+apx(n — P) = E(n) (3.18) 
where €(7) is the modeling error. Ideally ¢(7) is a Gaussian white noise sequence with a gain equal 
to Ibol*o,,”, which leads to the frequency spectrum: 


Dale 
Seu O< o< 2n. (3.19) 


S.(e# as, 
ee) = TAe® P = 


4. Model Order Selection 

Selecting the order of an AR model is a difficult task. The best choice is usually not 
known, and trial and error methods are sometimes used. If the data is truly described by a finite 
order AR model, then theoretically the variance will become constant once the model order is 
reached. In practice this is not usually true for a variety of reasons. The estimate may not 
converge, or if it does, it may be difficult to judge exactly when this occurs. A number of criteria 
have been developed: the four most well known are Akaike’s Information-theoretic (AIC), Parzen’s 
criterion Autoregressive transfer (CAT), Akaike’s final prediction error (FPE) and Schwartz and 
Rissanen’s minimum description length (MDL).{1, p. 549] In this study the initial model order 
was chosen by estimating the model order using AIC, MDL, CAT and FPE criteria on the sperm 
whale signal. These criteria were implemented earlier in the program ORDER.M written by LT 
Ken Frack, USN. [15] The sperm whale signal was used to set the model order because it is highly 
localized in time and has the broadest bandwidth of all various signal classes considered in this 
study. Thus, a larger model order may be needed to accurately describe it. The other signals are 
more localized in frequency and typically need a lower model order unless the signal has a low 
signal to noise ratio. The results of running ORDER. M are shown below with the noted criteria. 
For this study the model order was chosen to be 25 based on the mean of the four minimum model 


orders. 
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Figure 3-15 Model order selection for sperm whale using AIC(dotted line), and MDL 


criteria (solid line). 
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Figure 3-16 Model order selection for sperm whale a) CAT, b) FPE criteria. 


5. Reduced-Rank Covariance Method 
When a sinusoidal signal is embedded in noise, AR modeling methods generally perform 


poorly for pole position estimates.[2, p. 225] The mechanism used in this study to improve the 
covariance method is to reduce the rank of the estimated correlation matrix with the singular value 
decomposition (SVD). The theory behind this method is to separate the contribution due to noise 
from that of the signal-plus-noise by applying an SVD to the signal correlation matrix. A review 
of the SVD is introduced next, followed by examples of the reduced rank covariance method used 
in this study. 

a. The Singular Value Decomposition 

The singular value decomposition theorem states that any M x N matrix where 


M >N can be decomposed into the product of three matrices. 
R=UxV", (3.20) 
where the matrix U is the M x M unitary matrix containing the so called left singular vectors of R, 


= | 
LE a ees ell (3.21) 
aoe | 


the matrix V is an N x N unitary matrix containing the right singular vectors of R, 


V =| vigil (3.22) 
| | | 
and the matrix & is an M x N diagonal matrix containing the singular values of R, 
Ore =O, 0 
OG eo 
L =| 00m oe, (3.23) 
Deal) 0 
© 0 0 


where 0; > 02 >... > On.[], p.54-55] The SVD allows the user to decompose the signal into its 
principal components. The assumption is that the singular values associated with the signal will be 


larger than the singular values associated with the underlying noise. As a result the singular values 


associated with the signal and noise components are separated from the singular values associated 
with the noise by a gap. With this decomposition one can identify the signal components and invert 
only the portion of the SVD decomposition pertaining to the signal only. 

b. Rank Reduction 

Applying the SVD to the covariance method leads to solving for the a coefficients 


in the following manner from Equation (3.14): 


a=—R* -d, 
where R*=UGID>Y “VCO, 
and > = diacdiic 2 ia.) (3.24) 


R’ is the pseudo inverse of R of rank k, where k is the number of large singular values contained in 
x. The rank reduction can be viewed as a truncated SVD that sets smaller singular values 
associated with the noise to zero, thereby canceling ill-conditioning problems that would occur 
when inverting an almost singular matrix. The selection of the singular values to Keep is done 
visually by detecting a gap in the plot of the singular values. The method used in this work was 
done by visual inspection followed by a comparison of the AR spectrum with the Fourier spectrum . 
of the signal segment. We used this method because the underlying noise is not constant, and the 
signal to noise ratio varies with each signal segment analyzed. This comparison allowed us to see 
how well the model tracts the dominant frequencies of the signal. Not being cognizant of the 
resulting AR spectrum could lead to cutting out some of the signal contributions. When the data 
consists of a signal with a high signal to noise ratio, the rank of the AR covariance matrix can 
easily be reduced by detecting the gap between the singular values of the signal and noise and those 
of just the noise. The underlying assumptions are that the signal is much stronger than the noise, 
and is close to being stationary in the segment interval. As a result, a signal fitting this criterion 
will have singular values that are much more prominent than the singular values associated with 
the noise. The singular values associated with the noise will be almost constant and small in 
magnitude. The following figures are typical examples of singular value plots of the six categories 
of signals used in this study. Also shown are plots of the AR spectrum plotted over that of the 


Fourier spectrum of the typical signal segments. 
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Figure 3-17 Pilot whale data a) singular values of AR covariance matrix of order 25, b) Typical AR 


and frequency spectra of data segment of length 512. 
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Figure 3-18 Earthquake data a) singular values of AR covariance matrix of order 25, b) Typical AR 


and frequency spectra of data segment of length 512. 
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Figure 3-19 Gray whale data a) singular values of AR covariance matrix of order 25, b) Typical AR 


and frequency spectra of data segment of length 512. 
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Figure 3-20 Humpback whale data a) singular values of AR covariance matrix of order 25, b) 


Typical AR and frequency spectra of data segment of length 512. 
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Figure 3-21 Killer whale data a) singular values of AR covariance matrix of order 25, b) Typical AR 


and frequency spectra of data segment of length 512. 
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Figure 3-22 Sperm whale data a) singular values of AR covariance matrix of order 25, b) Typical 


AR and frequency spectra of data segment of length 512. 


nN 
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Table 3.1 depicts the typical rank chosen when modeling the signals in this study. Note 
that the sperm whale requires the highest rank, which is due to the wide frequency bandwidth of the 
Signal. The gray whale also requires a relatively high reduced rank due to the harmonic qualities of 


the signal. Finally note that earthquake and humpback whale signals require 4 reduced rank equal 


to 2. 
Ba 












1 
Sperm whale | ] 
Gray whale l 


Table 3.1 Typical number of singular values selected for retention for each class of signal. 


D. THE DISCRETE WAVELET TRANSFORMATION 

1. Introduction 

Wavelet theory provides a unified framework for a number of techniques that are applied 
in signal processing. These techniques include multiresolution signal processing, used in computer 
vision, subband coding, developed for speech and image compression, and wavelet series 
expansions, developed in applied mathematics. The Discrete Wavelet Transform (DWT) is used 
on this study as a classification tool to take advantage of the filters’ constant-Q spectral 
characteristics which may match the spectral characteristics of the biological signals under study 
well. The signals analyzed in this study are non-stationary in nature. They vary in frequency 
content, signal length, magnitude and background noise. Analysis by classical means is therefore 
difficult. The DWT provides an alternate to the Discrete Short-Time Fourier Transform, 
(DSTFT), in that unlike the DSTFT, the DWT uses an analysis window that can be dilated and 
contracted to give different resolutions in frequency and time on the time-scale plane. The DWT 
allows for the localization in time of high frequency, fast changing signals and allows for the 
localization in frequency of low frequency, slow changing portions of the signal. 

Analogous to the time-frequency plane in the DFT, the signal is mapped onto the time- 
scale plane. Scale is inversely proportional to frequency and denotes the level of contraction or 


dilation of the basis filter. Scale and level are used interchangeably and are synonymous. 


This section presents a review of the DWT starting from its relationship to the Fourier 
transform and the Short time Fourier transform. Next we present two discrete implementations of 
the wavelet transform; the Mallat’s algorithm and the A-Trous al gorithm. Finally, we present how 
each of these algorithms are applied in the study to derive the parameters of average energy per 
scale for the Mallat’s algorithm and average energy per voice per scale for the A-Trous Algorithm. 

2. The Continuous Wavelet Transform And Series 

We first present the similarities between the WT and the continuous Fourier transform 
and the Short-Time Fourier Transform to tie wavelets to a classical signal processing tool. The 
continuous-time Fourier Transform is an important tool in the analysis of stationary signals as it 
describes the signal using a basis of complex exponentials. The Fourier Transform of a signal is 


given by: 


F(@)= if mie ae (3.25) 


oD 


The Fourier transform has many drawbacks when applied to non-stationary data, as all non- 
stationary information will be ‘averaged out’. The Short -Time Fourier Transform (STFT) 
overcomes some of these drawbacks by sliding a window, w(t) over the data to be analyzed. The 
Fourier transform of each successive segment of data is then computed to extract the Fourier 


transform information over each segment. The STFT is defined by: 


STFT, (w,t) = | f (w(t —1)e-™ dt, (3.26) 


—oO 


where the finite windowing function, w(t-7), is centered around time t. The localized signal is 
transformed giving the frequency representation at that time. The window is then shifted along the 
time axis and the procedure is repeated. The resulting transform is time dependent, and forms a 
time frequency representation of the signal. 

A noted drawback to the STFT is the fixed window length of this transform. Thus, the 
transform can not adapt to the changing characteristics of a signal at a certain point in time. To 
address this drawback, another type of transform was developed, the continuous time wavelet 
transform (CTWT). The CTWT is formed by taking the inner product of a signal with basis 
functions that can be both dilated or contracted, and shifted in time. The CTWT is defined as 





CTWT, (a,b) = T= j row ( =" a. (3.27) 
Ooi 
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The basis functions, or wavelets, defined as: 





l t—b 
Y,,(t) = a v( | (3.28) 


a 


are oscillatory in nature. They taper to zero at infinity and negative infinity or are zero outside of 
the support interval. The argument a is called the scale parameter, and the argument D is the time 
localization variable. By changing the scale parameter, the basis function either contracts, for a 
small a, or dilates for a large a. Note that the scale parameter a is inversely proportional to the 
frequency: a small a denotes a high frequency, while a large a denotes a low frequency. Thus the 
transform can adjust to the changing nature of a signal by the dilation and contraction of the 
wavelet function. An easy way to visualize this difference is to draw the time-frequency tiling of 


both the STFT and the CTWT side by side. 


Frequency Frequency 


(a) Time (b) Time 





Figure 3-23 Tiling of the time-frequency plane. (a) Short-time Fourier transform; (b) Continuous 


time wavelet transform. 


As pictured, the STFT (left) has a uniform window length which creates a rectangular grid 
over the time-frequency plane. The uniform window length provides the same localization in time 
for all frequency components of the signal. The CTWT (right) provides a longer window for low 
frequency signals that do not change abruptly in time, and a shorter window for high frequency 
Signal that change rapidly with time. Thus the CTWT adjusts to the changing nature of the signal. 
Note that both the STFT and the CTFT must satisfy the property that each time-frequency tile, or 


time-scale tile, have uniform area. 
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In addition, the function y(t) has to satisfy the following condition in order to be able to 
reconstruct the signal f(t) from its CT WT: 


oo Z 
C= I'F(@)I 


ia lal 


—Oo 


da@<co, (3.29) 


where ‘¥(@) is the Fourier transform of y(t). This ensures the transform is a bounded invertible 
operator in the appropriate spaces [6 p. 2645]. If w(t) tapers to zero at infinity and oscillates, then 


it must have zero mean, 
[w(Odt = 0. (3.30) 


The inverse transform or reconstruction of the signal is defined as: 


ih) = =i CTWT, (a,b) pay (=P) (3.31) 
The factor 1/a’ in the integration is the Jacobian. The signal f(t) can be described as the 
Summation of the basis functions y.,(t) and the coefficients CTWT\{a,b). The constant Cy 
depends only on the basis function y,,(t). The measure in this integral, dadb, is formally 
equivalent to integrating over time and frequency or dtdf [5 p.14]. We assume that both the 
wavelets and the signal are real-valued or complex-analytic so only positive dilations need be 
considered. 

3. Discrete Wavelet Transform 

We must consider the discrete version of the CT WT, as our study deals with sampled, 
discrete signals. Thus, in the following we consider discrete values for a, a=2' where i is termed 


the octave of the transform. The parameter b relates to discrete time, which leads to: 
; ] ' * [ — b 
CV 2) ree) Ja (3.32) 
If we now take b to be a multiple of a or b = 2'n we have: 
CTWI,(2' 2 n)= _ frow'(Z n ir (3.33) 
, 2 V2 J 9: : ° 


The next step is to discretize the integral by replacing it with a summation to obtain the wavelet 


series expansion: 


nme ] leaks 
w(2' ,2'n) = —= (k (< -n}. (3.34) 
Te POM a 


The sample rate has been chosen to equal one. Equation (3.34) is called the decimated wavelet 
transform, as indicated by the 2'n on the left hand side. The transform is only computed every 2’ 
Samples at octave 1. 

To further discretize the wavelet function we need to breakdown the wavelets into two 
filters: the analysis filter and the scaling filter. Through the appropriate use of the discretized 
filters representing the sampled wavelets, we can build different algorithms that accomplish 
wavelet analysis. Let g be the discrete highpass analysis filter obtained from sampling the 
truncated wavelet function: 

8, =n). 
Proceeding from Equation(3.34), we can arrive at two different algorithms that represent 
applications of the discrete wavelet transform (DWT). [6 p. 2465] The difference in the resulting 
algorithms comes from the definition of the relationship between the scaling filter, h, and the 
analysis filter, g. The scaling filter is a low pass filter that yields the next scale that the signal will 
be analyzed at. The scaling filter operates on the signal spectrum from 0 to fs/4, where fs is the 
sampling frequency. The analysis filter is a high pass filter that defines the coefficients to be 
analyzed. The analysis filter thus operates on the spectrum from /s/4 to fs/2. The first algorithm is 
Mallat’s multiresolution algorithm which is an orthogonal wavelet transform. The second 
algorithm is the A-Trous algorithm, which is a non-orthogonal wavelet transform. 

4. Multiresolution Algorithm 

This section describes Mallat’s multiresolution algorithm which is illustrated in Figure 3- 
24 below, where the “12” indicates decimation by 2. The wavelet coefficients d' are obtained as 
the outputs of the combination of a lowpass filter g(z) followed by decimation by 2, and a high- 


pass filter h(z). 


EP ER rer 
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Figure 3-24 Mallat's multiresolution algorithm. 
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In the Multiresolution algorithm the following signals are formed: 


sit! = A(h*s') 
d'" = A(g*s'), 


(3.35) 


where A is the decimation operation. The filter / is a lowpass scaling filter. The filter g is the 
high pass discretized wavelet function. This algorithm leads to an orthogonal wavelet transform 
when the wavelet filters satisfy specific requirements. In addition, the scaling filter fand the 


analysis filter g, satisfy the following properties [5 p28, 6 p2968]: 

h(L -1-n)=(-))" 2(n), (3.36) 
and [6 p. 2468]: 

Ze [Pe2j-mltay ng + 825 mB2)-0)= Sims 


5 hy; goa = 0, 
7 


ee (3.37) 
£20, 
vA, = V2. 


Several families of wavelets have been found to satisfy the above requirements. In this 
study we used two orthogonal basis sets; Coiflet 3 and Symmlet 8. These basis sets were included 
in the software toolbox for MATLAB® “Wavelab .55” produced by D.L. Donoho et. al. of 
Stanford University [7,8]. Each of these wavelets were chosen for their high degree of regularity, 
where regularity implies that: y(n) = g"(n) = g(-n). Therefore, the DWT acting on the sampled 
signal 1s exactly the sampled output of the continuous wavelet transform [6 p. 2466]. Figure 3-25 
presents the spectral partitioning obtained for the DWT with Symmlet 8 wavelet coefficients. The 
high degree of regularity is shown by the small amount of leakage to the adjacent frequency bands 
covered by higher scales. 
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scaling coefficients = Symmiet, 8 





Figure 3-25 Frequency resolution of filter bank using Symmlet 8 wavelet coefficients. 


a. Application of the Multiresolution Algorithm to signal classification 


The wavelet-based feature coefficients selected for classification when using the 
orthonormal wavelet decomposition are made up of two sets of coefficients: 1) the average energy 
contained in wavelet-based quantities obtained from scales 2 to 8, and 2) the average energy 
contained in the low-pass signal coefficients obtained at the same scales. Scale | is not decimated 
and is not used in this application. 


The average energy quantity E; obtained at scale 7 is obtained from: 


256-2! 


E. = <> i Dae ale = Jes (3.38) 
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where c;,, represents the k"" wavelet coefficient obtained at time lag 2“ and at scale i. The average 
energy of the low pass coefficients are found using the same equation. The wavelet coefficients C;, 
and the lowpass coefficients are derived using the program Ecoeff.M, which calls function 


FWT_PO.M [8]. Thus a seven scale decomposition of a signal segment of 512 points leads to 14 


5. The A-Trous Algorithm 
The A-Trous algorithm differs from the Mallat algorithm only by the decimation of the 


high pass filter output. Figure 3-26 below represents the basic algorithm, where the +2 represents 


decimation by 2. The output of the transformation w , is the decimated discrete wavelet transform 


of the original signal. 


Figure 3-26 A Trous wavelet filter bank structure. 


Further insight may be gained when the algorithm is viewed through the frequency domain 
as follows: 

A) Bandpass the upper half of the spectrum, using analysis filter g, to yield w. 

B) Lowpass filter to obtain the lower half of the spectrum [0,2/2] using scaling filter f, 

where the sampling frequency is equal to 27. 

C) Decimate to expand the lower half to [0, 7]. 

D) Go to A). 
The algorithm is straightforward. The analysis filter output w', is obtained by using g and 
represents the high frequency information of the signal s'. We then low pass filter the remaining 
signal using the scaling filter f. By doing this, the low frequency portion of the signal that has not 
been examined is retained and is not aliased by the upper frequency band of the signal in the 
dilation that follows. Decimating the signal in time dilates the signal in the frequency domain. 
Thus the low frequency signal energy is now spread throughout the entire spectrum. The result is 
octave i+]. 

Potential problems exist if we choose the high pass filter g to have a bandwidth to be less 
than z/2. This would cause part of the signal not to be examined, and thus be lost. If we replace 
the single analysis filter g by a bank of filters of the type g to cover the entire upper half of the 


spectrum, we are introducing voices. Two things are accomplished with voices: 1) we ensure we 


oS 


have sufficient bandwidth to cover the upper spectrum and 2) we add the benefit of increased 
resolution in the high frequency band of the signal. Thus the voices are constructed of banks of 
bandpass modified Gaussian filters. Any number of voices can be used to increase resolution. 

The difficulty in implementing Equation (3.34) to obtain the wavelet coefficients is that, as 
the octave, i, increases the continuous wavelet, y(t), has to be sampled at more and more points 
creating a large computational burden. The solution to this computational burden is to 
approximate the non-integral values through interpolation via a finite scaling filter f called a 


Lagrange interpolation filter. A Lagrange interpolation filter is such that 





[6 p. 2468] 
hx h® (3.39) 
ms 3. 
V2 
where / is an appropriate Daubechies wavelet filter. In this application we implemented a 
Daubechies 4 as the basis for the Lagrange interpolation filter as in [6 p. 2475]. 
The Morlet wavelet is used in the A-Trous implementation and is given by: 
w(t) =eMe hr? (3.40) 


where B is the roll-off parameter which determines how fast the modified Gaussian filter decays to 
1/e of its peak value. The center frequency, v,of the first modified Gaussian filter or voice is set at 


some fraction of m above 7/2.[6 p. 2478] The Fourier transform of w(t) is given by 

I -(w-v)* /2 B" 

V(w)=—e (3.41) 
B 

The following restrictions need to be satisfied by the parameters v, and B: 

1 

a (3.42) 

2 


The center frequency of the voice must be greater than r/2 in order for g to be in the upper half of 


the frequency spectrum. Next, in order for y(t) to be analytic, and admissible, 


B<—. (3.43) 


Finally in order that the spectrum not be aliased, 


Ve ee (3.44) 
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Any number of voices, i.e. sub-filters defined at each scale, can be chosen. 


The number of voices M needed to fill the upper spectrum is [6 p. 2478]: 


TR 2 Bl. 
2125 28 


(3.45) 





Voices add an extra dimension to the DWT in that greater frequency resolution can thus be seen 
per octave. This higher resolution is what makes this implementation of the DWT of interest to us, 
because it lets us discriminate between different signals that lie close together in the frequency 
domain. 

a. Application of the A-Trous Algorithm to signal classification 

The signals under study sometimes occupy the same octave level making it 
difficult to separate them. The A-Trous algorithm is applied in this study using a range of three to 
seven voices per octave which increases the frequency resolution of the transform to better separate 
the signals. Figure 3-27 represents the spectral partitioning obtained using four voices per octave, 


a center frequency, v = .85n, anda B =0.15. 


A Trous Impl.:4 voice(s) - beta:0.15 nu:0.85pi 


magnitude 





0 0.1 0.2 0.3 0.4 OD 0.6 OL7 
normalized frequency 


Figure 3-27 Frequency coverage of the A Trous algorithm using seven octaves, B = 0.15, and v = 0.85 
Tt. 
As before, each signal is segmented into 512 point segments and normalized to have unit power per 


segment. Seven octaves are used to include the low frequency resolution of the Earthquake signal. 


The resulting average energy per scale 7, and voice j, E;; was then computed for each segment 
using: 


1 256—2'* IN 
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j-0,1, ..., N-1, (3.46) 


where N is the number of voices chosen, and ¢i;, represent the k” wavelet coefficient obtained at 
scale 1 and voice 7. The average energy per scale per voice per segment is then processed by the 
back-propagation neural network for classification. 

In his chapter, we have introduced the methods of feature extraction used in this work. 
Next in Chapter IV, the various signal features are used as inputs to a back-propagation neural 
network for classification purposes. We present the concept of the back-propagation neural 


network and the neural network choices selected in this work. 
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IV. CLASSIFICATION VIA BACK-PROPAGATION NEURAL 
NETWORK 


A. INTRODUCTION 

Through the feature extraction process, signals under study are reduced from an average of 
40,000 data points to a much smaller number of related coefficients. Back-propagation neural 
networks have been proven useful in numerous complex classification problems [9], and this 
section introduces the back-propagation neural network configuration used for the classification 
procedure. 

The back-propagation network is a multilayer feedforward network. Typically it consists 
of an input layer, one or more computational layers called hidden layers and an output layer. The 
network learns during a supervised training session, where each input vector has a target output 
vector. Learning takes place when input related coefficients are presented to the network in the 
input layer. The input is propagated through the network in a forward direction, on a layer-by- 
layer basis, to the output layer. The output layer is compared to the target classification and the 
error is back-propagated through the network layer-by-layer, neuron by neuron, updating the 
connection weights. The connection weights are the memory of the system. Once the network 
converges on a stopping criteria, the weights become fixed and the network can be used for testing. 
The testing procedure is conducted using data with the same characteristic as those used in the 
training phases. The NN output is then compared with the known target values and a classification 
rate is tabulated. 

This section introduces the concept of the back-propagation network as applied in this 


thesis, and the network choices selected for training the network. 


B. THE BACK-PROPAGATION NEURAL NETWORK 

This study employed networks consisting of an input layer, two hidden layers, and an 
output layer. Each layer is fully connected to the succeeding layer, meaning the output of each 
neuron is connected to the input of each neuron in the next layer. During learning, information is 
paired with a desired response or target. The output of the output layer neurons is compared with 
the target producing the local error at the output. This learning scheme where the network is given 


both the input and the target classification is called supervised learning. The local error is 


a7 


propagated back through the network, and used to update the connection weights. The back- 
propagation network employed has a hetero-associative memory, meaning the pattern on recall 
from memory is purposely different from the input pattern[10 p. N. 315]. In other words, the 
network learns higher order relationships for each classification of input data and classifies based 


on these relationships. A diagram of a typical back-propagation network is given next. 


Output layer 
Hidden layer 2 


Hidden layer 1 


Input buffer 





Figure 4-1 Typical back-propagation network. 


1. The Processing Element 


Figure 4-1 presents the basic building block of the neural network, called the processing 
element (pictured as circles) . A widely accepted diagram for the processing element (PE) 1s shown 
in Figure 4-2. Each PE linearly combines the individually weighted inputs from the previous 
layer, and a bias value. It then transforms the combination through a nonlinear transfer function to 
form the output of that PE. Each output is then input to a processing element in the next layer and 
individually weighted. Note that the input layer structure is different from the remaining layers, as 
it does not process the data, but serves as a buffer that distributes the input to the hidden layers 
where processing occurs. 

The following notations are used in this section to define the NN elements: 

e x! — output of the j” neuron in the s layer, 

e w;,'! connection weight joining the i” neuron in the [s-1] layer to the j” neuron in 

layer s, 


e J"! ->weighted summation of inputs to the j" neuron in layer s. 
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Transfer Function 


x;! Output 


Synaptic Weights 
(including bias) 


Figure 4-2 Generic Processing Element 


2. The Transfer Function 


A back-propagation element transfers its inputs as follows[10, p. NC-64]: 


ee = f E(w, a) = o({ 1) ; (4.1) 


() is traditionally a sigmoid function, but can be any differentiable function. In this study, a 


hyperbolic tangent is used and is defined as: 


—_— 7 


et —e7 
Q(z) = eS (4.2) 


—_—> 


Cas 


This transfer function is asymetric about the origin and has a range of output values from -1] to +1. 
An error signal originates at each output neuron of the network. Thus, the network as a 
whole has a global error function that is a compilation of each error at each output neuron. Given 
that a network has some global error function E that is a differentiable function of all of the 
connection weights in the network, the critical parameter that is projected back through the network 


is the local error. The local error is defined as: 


5) 


dE 
e _-— 


J él 4, ° (4.3) 
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This equation defines the local error obtained at the j” PE in the s“ layer. Using the chain rule 
twice yields the relationship between the local error at a specific PE in level s and all of the local 


errors at the previous level s+/: 


e =G Ml) veer wy). (4.4) 


The derivative of the hyperbolic tangent transfer function, @, is: 

Q(z) = (1.0 + p(z)) * (1.0 — gz) (4.5) 
Therefore the local error can be rewritten as: 

e = (02) C0= 2 aye, ww," )- (4.6) 
Equations 4.1 through 4.4 are the main workings of the back-propagation network. Again the idea 
is to forward propagate the input to the output, compare the output to the target to determine the 
local error at the output, then back-propagate the local error to the input layer. The goal of the 
super vised learning process is to minimize the global error by adjusting the weights, imparting 


knowledge of the local error to each PE. This is done, as in the LMS algorithm, through the 
gradient descent method. The gradient of the global error is approximated by: 


(s] 
oE = OE ° Bes = =e) 4 xi . (4.7) 
Sw} | atl} | awl | 


The weights are updated in accordance with the size and direction of the negative gradient on the 








error surface scaled by the learning coefficient /coef in accordance with the following equation 


[10, p. NC-66]: 
Aw!) = Icoef (ei)! + x;""). (4.8) 


3. The Normalized-Cumulative Delta Learning Rule 
Learning coefficients are determined by the specific learning algorithm employed. This 
study used the Normalized-Cumulative Delta learning rule, where the weight update equation 


becomes: 
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a ae er: 
m,'=m, +C, *e, * x, at each iteration 
Ww, =Ww, +m; + C14, 


(4.9) 
a,,'= m,. if lear count modulo Epoch = 0 


An Epoch is the number of iterations the network will use to either converge or stop processing. C, 
is defined as the learning rate and is related to the Epoch size. The modulo of the Epoch is defined 
as the number of inputs to be considered as an example set. The number of the Epochs determines 
the value for the learning rate. As the number of the Epochs increase, the learning rate should 
decrease, otherwise the accumulated weight changes will cause the learning to diverge. C, is the 
momentum term and is used to help smooth out the weight changes. m,; is the memory term where 
m is the present memory of the PE and m’ is the resulting memory change. The weight changes are 
accumulated over the modulo of the Epoch and are applied at the end of the modulo of the Epoch. 
While the network is iterating the learn count modulo Epoch is equal to zero. At the end of the 
modulo of the Epoch, learn count is equal to 1 and the weight changes are applied. The 
Normalized- Cumulative Delta learning rule scales the learning rate C, by one over the square root 
of the epoch size. The modulo of the Epoch for this study was set at 100 training files for each 
network, where one file is One input vector of features extracted by one of the methods addressed in 
chapter three coupled with its output target. The target is the true classification of the signal. 

4. MinMax Tables 

Saturation of the transfer function occurs when input to the neural network is not suitably 
scaled to the transfer function. The hyperbolic tangent transfer function acts nearly linear to inputs 
which are in the range of + 2. If input values to the network are extremely large, for example 
10,000, even small weights will cause saturation of the transfer functions. When the transfer 
function is saturated, the derivative of the transfer function is nearly zero. This causes the weights 
not to be updated and the network does not learn. To avoid this hazard, MinMax tables are 
generated to scale the input to the network to the transfer function. This pre-processing function is 
available in the Neural Works professional II/Plus software [10] and was used in this study. 

5. Network Architecture 

The number of PEs in the hidden layer, and the number of hidden layers in a back- 


propagation neural network are important decisions in network architecture. Most back- 
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propagation networks will have one or two hidden layers with the number of PEs in the hidden 
layers falling in between the number of input values and the number of output PEs. The number of 
PEs depends on the complexity of the relationships between classifications of data. Signals that 
are not easily separated will require more PEs to distinguish between them. A rule of thumb most 
quoted in literature for a single hidden layer network 1s [9 p. 39}: 


a # of training cases (4.10) 
5x (m+n) 


where: 
e isthe number of PEs in the hidden layer, 


e cases are the number of records in the training set, 
e mis the number of PEs in the output layer, 
e isthe number of PEs in the input layer. 
Calculating the number of PEs in the hidden layer for the reduced-rank covariance coefficients 


equals: 
(a 
5x (6 + 25) 
Actually trying the network with twenty five input, four hidden, and six output PEs produced a 


=3.48=4. 


network that would not converge to a reasonable classification rate in a five hour time frame. The 
network architecture that constantly converged in a reasonable time frame for this study included a 
first hidden layer equal to the number of inputs, followed by fifteen in the second hidden layer, and 
six output classes. The networks usually trained for an hour to an hour and an half to reach 
stopping criteria. Some data sets responded better with different parameters, however generally 
speaking this choice worked well. A very limited attempt was made to optimize the network, 
resulting in no significant performance improvement. The emphasis of this work was to prove the 
concept of using the different feature extraction methods with a neural network classifier. 
Optimizing the neural network structure may increase the classification rate, but generally will be 
close to the values obtained in this study. The more effective the feature extraction 1s at finding 
unique values for each signal, the better the results will be using the neural network as a classifier. 
The signals used in this study were not easily separated, and thus the two hidden layer network 


worked much better than the single hidden layer network derived from the rule of thumb. 


42 


6. The Classification Rate 


This study employed a classification rate as the measure of performance for the network. 
The idea behind the classification rate is for the network to pick a winner, which is simply the 
output PE with the largest value. Thus, if we compare the winner with the target we have a binary 
yes or no answer for correctness in classification. NeuralWorks built such an instrument into the 
software. 

The classification rate instrument provides a two-dimensional comparison of desired 
results to actual network response. This instrument is appropriate for this problem because the 
network needs to classify each signal as one of the six biologic or seismic categories. The output 
response of the network is thresholded with a 1-of- n transformation. The winner is valued at 1, 
and the others are valued at zero forming the winning vector. The sum of the winners are divided 
by the number of input sets per output category. This reveals how the network classified each 
category of input data as a number from 0 to 1.0, and the overall classification rate of the entire net 
is the average of the six correct classification rates per category. 

The dimension of the classification rate instrument is a square of size of the output layer. 
In this study, the output layer contains PEs representing the six classes of signals. The 
classification rate instrument was thus a 6x6 matrix. A value of 1.0 in any of the 6 boxes per 
column means for that input all were classified as that particular output. A zero corresponds to 
none of the input were classified as that output. The perfect classification would result in 1.0s on 
the antidiagonal and zeros every where else. The values obtained from this instrument are included 
in the results section incorporated in Ths output matrix. 

One drawback to the classification rate is that this quantity doesn’t relate how close the 
maximum PE output value is from the other output values. Thus in addition to the Classification 
Rate, the mean and standard deviation of each output PE value are computed to give additional 
information regarding the NN performance. 

7. Training and Testing The Neural Network 

The goal of training a back-propagation network is to encode as many training examples 
as needed to correctly generalize [11 p. 176]. A network is said to “generalize” well when the 
classification rate computed by the network is reasonably high for test data which was not used 
during the training phase. However, it is assumed that the test data is drawn from data with the 


same general characteristics as the training data. The generalization process can be viewed as non- 
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linear curve fitting, or a non-linear input-output mapping. This view point allows us to visualize 
the learning process not as a “magical life giving process’ but as a good non-linear interpolation of 
the data. A network that is designed well and is adequately trained should still have a high 
classification rate when tested on data that is slightly different from the data used for training. 
When the network learns too many ambiguous input-output relationships, that is, when it is 
overtrained, the network may produce poor results even when tested with data that is only slightly 
different from the testing Set. 

NeuralWorks provides a learning and testing scheme called Savebest to mitigate the effects 
of overtraining. The option Savebest trains the network for a user prescribed number of epochs, 
and tests the network with the testing file. The result is compared to the previous saved best 
network. The network is saved based on the criteria, i.e., highest classification rate, or lowest root- 
mean square error. This process repeats for the user prescribed number of failures to produce a 
new best. The network when finished will revert to the saved-best network, thus averting an 
overtrained network that memorizes the training set of data. 

In this study, the length of training sets before testing is set at 10,000 epochs and the 
number of retries, or failures to produce a new best, is set at 20. The criteria used was the overall 
Classification rate. 

The training and testing files were built using the signals as described in the previous 
chapters. Using MATLAB*, a matrix was constructed with each column representing the features. 
For example, in the case of the reduced-rank covariance method, the number of features was the 25 
model parameters per segment, in the case of the A-Trous algorithm with 4 voices per octave, the 
number of features was equal to 28 average energy values per voice and segment. The matrix was 
appended with the target vector consisting of six values of either a one or a zero corresponding to 
the output neuron designated to the classification of the data. The data was separated into training 
files and testing files by using two thirds of the smallest file as the minimum number of training 
files per class of data, which amounted to 86 files. The remaining one third, of the smallest 
category was used as minimum number of testing files per class. Training files and testing files 
were Set up for each of the feature extraction methods. The features include reduced-rank 
covariance(AR) coefficients, ALE pre-processed AR coefficients, wavelet average energy per 
scale for both Symmlet 8, and Coiflet 3 coefficients, ALE pre-processed wavelet average energy 
per scale for both Symmlet 8, and Coiflet 3 coefficients, AR and wavelet methods combined for 


44 


Symmilet and Coiflet coefficients, ALE pre-processed AR and wavelet combination for both 
Symmlet and Coiflet coefficients, and finally the average energy per voice per scale using the A 
Trous method for 3, 4,5,6, and 7 voices per octave. In all, 40 different networks were trained using 


the fifteen above categories. The twelve best networks are presented in the results. 
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V. RESULTS 

The top twelve NN implementations of all networks considered in this study are 
summarized in Table 5.1. Note that ALE based noise reduction was not applied when using the A- 
Trous implementation due to the high classification rate already achieved. The network numbers in 
the second column of the table represent the number of PEs in the input Jayer/ hidden layer 1/ 
hidden layer 2/ and the output layer. The overall classification rate is as defined earlier in Chapter 
IV section 6. In many of the feature extraction methods used, we tried to “clean” the signals by 
applying the ALE algorithm to reduce the noise and increase the ability of the NN to classify the 
signal. The last and second to last method used both the AR and Wavelet feature extraction 
methods on the signal and combined the features into one vector for the NN to classify. The last 


method took this approach one step further by using the ALE algorithm first. 







Scale) 
Scale) 
Scale) 
Scale) 


AR & Wavelet (Coiflet 3) 39/30/15/6 86.64 % 


ALE/AR & Wavelet (Coiflet 39/30/15/6 95.78 [% 
3) 


Table 5.1 Feature Extraction Performances 
















Table 5.1 shows that the highest scoring network is obtained for the A-Trous algorithm 
with six voices per scale. The best solution, requiring the least preprocessing and the smallest 
neural network, is, however the A-Trous algorithm with four voices per scale. Five networks 


achieved an overall classification rate in excess of 95 %. This is our self induced threshold for a 
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successful classification scheme. Note that performance could potentially be farther improved by 
opuumizing the neural network for each feature extraction method. 

Tables 5.2 to 5.13 and Figures 5.1 to 5.12 present the performances obtained with each of 
the feature extraction methods summarized in table 5.1. For clarity purposes, a detailed 
explanation of Table 5.2 is presented next. Tables 5.3 to 5.13 follow the same presentation. 

The 1“ row in Table 5.2 indicates the number of testing files presented to the NN for 
classification in each class of signal. The 1“ column indicates the number of testing files classified 
in a specific class. All other rows show the mean and standard deviation (STD) obtained at each 
output node. The classification rate (CR) is as defined in Chapter IV section 6. Finally, the 
number of files classified in each class of signal is included. 

Row 2 to 6 present individual performance results obtained for each class. For example, 
Row 2 shows that 56 files were classified as “Sperm Whale” data, and that 4] out of 51 files were 
correctly classified, leading to a classification rate CR = 81%. Miss-classified sperm whale data 
were classified as either killer whale (7 files) or earthquake files (3 files). In addition, average 
output node levels are presented. For example, the average output level obtained for the Sperm 
Whale output node when the NN is presented with Sperm Whale data is 0.6602, and its standard 
deviation (STD) is equal to 0.3602. Note that the Sperm Whale output node level significantly 
drops down to an average of 0.0656 when the NN is presented with other types of signals(as seen 
across the top row), as expected. The main diagonal starting from the upper left of the table to the 
lower right contains the number of correct classifications obtained for the testing data. 
Performance results contained in Table 5.2 are also graphically presented in Figure 5.1. Class 
number 1] through 6 refer to Sperm, Killer, Humpback, Gray, Pilot whales and earthquake data. 
Test files 1 to 51 represent Sperm Whale data, Test files 52 through 102 represent Killer whale 
data, test files 103 through 153 represent Humpback Whale test data, test files 153 through 204 
represent Gray Whale data, test files 205 through 256 represent Pilot Whale data, and test files 
257 through 306 represent earthquake data. 
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A. PERFORMANCE RESULTS OBTAINED USING REDUCED-RANK AR 
COEFFICIENTS 


1. NoALE Preprocessing 
Figure 5.1 and Table 5.2 present the NN output Classification results obtained using 
reduced rank coefficients (AR). 
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Figure 5-1 Distribution of neural network classifications obtained using reduced-rank AR 
coefficients. Overall classification rate: 84.50%. 


The overall classification rate of 84.50 % is somewhat disappointing. However, note that 
biological and earthquake AR frequency contents partially overlap. As a result, the NN has 


difficulties classifying the data correctly. 
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Table 5.2 Distribution of neural network classifications obtained using reduced-rank AR 
coefficients; overall classification rate: 84.5 % 
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2. ALE Preprocessing Applied to Data 

Figure 5.2 and Table 5.3 present classification results obtained when the underwater data 
is first pre-processed using the adaptive line enhancer (ALE) filter introduced earlier in Chapter II 
section 3. This filter is applied to emphasize the narrow band contents contained in the data and 
decrease wide-band noise. Next the reduced rank AR coefficients of the filtered data are computed 
to be used as feature parameters. Note that the overall classification performance decreased. A 
possible explanation is that the ALE step removes some of the unique AR- features of the signal, 


thereby making it more difficult for the NN to classify the data correctly. 
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Figure 5-2 Distribution of neural network classifications obtained with reduced-rank AR 
coefficients; ALE preprocessing applied to the data; overall classification rate: 83.94%. 
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Table 5.3 Distribution of neural network classifications obtained with reduced-rank AR 
coefficients; ALE preprocessing applied to the data; overall classification rate: 83.94%. 


B. CLASSIFICATION RESULTS OBTAINED WITH WAVELET-TYPE 
PARAMETERS 


1. Orthogonal Wavelet Decomposition 


a. No ALE preprocessing: 
This section presents results obtained for the classification scheme using 


orthonormal wavelet-type quantities as NN inputs. The resulting classification rate is nearly 
identical to that obtained with the reduced-rank AR method in the case of the Symmlet 8 basis set. 
However, this wavelet-based method offers the advantage of requiring fewer NN input parameters. 
Fourteen coefficients are used as compared to twenty-five coefficients for the reduced rank AR 
method. For this reason, this method represents a better solution even though the overall 
classification rate is the same. The results obtained using the Coiflet 3 basis set do not reach the 
level obtained by the AR and the Symmlet 8 basis set. Detailed results are contained in Figure 5-3 
and Table 5.4 for Symmlet 8 wavelet coefficients. Results for Coiflet 3 wavelet coefficients are 


contained in Figure 5-4 and Table 5.5. 
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Figure 5-3 Distribution of neural network classifications obtained using the orthonormal wavelet 
decomposition; Symmlet 8 basis set; overall classification rate: 84.67%. 
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Table 5.4 Distribution of neural network classifications obtained using the orthonormal wavelet 
decomposition; Symmlet 8 basis set; overall classification rate: 84.67%. 
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Figure 5-4 and Table 5.5 below displays the results of using Coiflet 3 basis set. This 
feature extraction technique achieved an overall classification of 78.05%. The results are 
disappointing in that they are far below that obtained by using the Symmlet 8 basis set and the AR 


neural networks. 
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Figure 5-4 Distribution of neural network classifications obtained using the orthonormal wavelet 


decomposition; Coiflet 3 basis set; overall classification rate: 78.05%. 
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Table 5.5 Distribution of neural network classifications obtained using the orthonormal wavelet 


decomposition; Coiflet 3 basis set; overall classification rate: 78.05%. 


b. ALE pre-processing applied to data 


This section presents classification results obtained when applying the ALE noise 
reduction technique to the data before using Mallat’s algorithm. In contrast to the AR process, the 
multiresolution algorithm greatly benefited from the ALE pre-processing. The classification rate is 
up to 95.17% for the Symmilet 8 basis set. Both the Symmlet 8 basis set and the Coiflet 3 basis set 
increased the overall classification rate by ten percent by first pre-processing with the ALE filter. 
Detailed results are presented in Figure 5-5 and Table 5.6 for the Symmlet 8 basis set next. Figure 
5-6 and Table 5-7 present the data for Coiflet 3 basis set. 


Distribution of Neural Net Output File 





Test File Number 


Figure 5-5 Distribution of neural network classifications obtained using the orthonormal wavelet 
decomposition; Symmiet 8 basis set; ALE pre-processing; overall classification rate: 95.17%. 
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Table 5.6 Distribution of neural network classifications obtained using the orthonormal wavelet 
decomposition; Symmlet 8 basis set; ALE pre-processing; overall classification rate: 95.17%. 
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Figure 5-6 and Table 5.7 display the results of the ALE Coiflet 3 basis set. The overall 
classification rate of 88.75% is a drastic improvement over the Coiflet 3 set alone, however it does 
not quite match the 95.17% achieved by the ALE Symmlet 8 basis set. It is an improvement over 
the AR coefficient method and did require a smaller neural network to implement. In this respect, 
it is a more successful implementation. It does not meet the 95% self induced threshold for 


SUCCESS. 
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Figure 5-6 Distribution of neural network classifications obtained using the orthonormal wavelet 


decomposition; Coiflet 3 basis set; ALE pre-processing; overall classification rate: 88.75% 
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Table 5.7 Distribution of neural network classifications obtained using the orthonormal wavelet 
decomposition; Coiflet 3 basis set; ALE pre-processing; overall classification rate: 88.75%. 
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2. Non-Orthogonal A-Trous Wavelet Decomposition 

This section presents classification results obtained when applying the A-Trous algorithm. 
Note the ALE noise reduction technique is not applied to the data as the performance is 
satisfactory without it. The A-Trous algorithm takes advantage of a better frequency resolution 
obtained when using multiple voices. Results show that classification performance is improved 
using this method as compared to the other methods used in this study. 

The A-Trous decomposition is implemented with four different combinations of voices. 
Four, five, six, and seven voices per scale are presented. Results show that the overall best | 
classification rate is obtained when using six voices per scale, however the size of the NN was 
significantly larger. Figures 5-7 through 5-10 and Tables 5.8 through 5.11 present the 


performance results. 


a. A-Trous Implementation with four voices per scale 


Distribution of Neural Net Output File 





Test File Number 


Figure 5-7 Distribution of neural network classifications obtained using the A-Trous 
implementation; 4 voices per scale; overall classification rate: 96.41%. 
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Table 5.8 Distribution of the neural network classifications obtained using the A-Trous 


implementation; 4 voices per scale; overall classification rate: 96.41% 
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b. A-Trous Implementation with five voices per scale 
Figure 5.8 and Table 5.9 display the results obtained from 5 voices pre scale. 


Note that the resultant overall classification rate is less than that obtained from four voices per 
scale. This was not expected. The expectation was that an increase in resolution of the feature 


extraction would produce an increase in the overall classification rate. 
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Figure 5-8 Distribution of neural network classifications obtained using the A-Trous 
implementation; 5 voices per scale; overall classification rate: 93.46 %. 
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Table 5.9 Distribution of the neural network classifications obtained using the A-Trous 
implementation; 5 voices per scale; overall classification rate: 93.46 % 
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c. A-Trous Implementation with six voices per scale 
This implementation of the A-Trous algorithm has the highest classification rate of 


all tested. Even though this implementation has the overall best classification rate, it also has more 
false earthquake classifications than either four voices or five voices per scale. In addition it 


requires a higher number of NN input coefficients and correspondingly a more complex NN. 
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Figure 5-9 Distribution of the neural network classifications obtained using the A-Trous 


implementation; 6 voices per scale; overall classification rate: 96.73%. 
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Table 5.10 Distribution of the neural network classifications obtained using the A-Trous 


implementation; 6 voices per scale; overall classification rate: 96.73 %. 
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d. A-Trous Implementation with seven voices per scale 


Results show an overall classification rate of 95.10%, as indicated earlier in Table 
5.1. Note that the NN implementation complexity has increased due to the larger number of input 
NN coefficients. This NN configuration has 49 PEs in the input layer, 49 PEs in hidden 1, 15 PEs 
in hidden 2, and 6 output PEs. This is the largest network considered in this study. Figure 5-10 and 
Table 5.11 present classification results obtained using the set of averaged wavelet-type 
coefficients. We note that the NN implementation obtained with four voices per scale has a better 


overall classification rate using a much smaller network. 
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Figure 5-10 distribution of neural network classifications obtained using the A-Trous 
implementation: 7 voices per scale; overall classification rate: 95.10 %. 
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Table 5.11 Distribution of neural network classifications obtained using the A-Trous 
implementation: 7 voices per scale; overall classification rate: 95.10 %. 
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C. CLASSIFICATION RESULTS OBTAINED BY COMBINING AR 
COEFFICIENTS AND ORTHONORMAL WAVELET-TYPE PARAMETERS 


This section presents classification results obtained by combining the ALE AR and 
wavelet feature coefficients. This technique is considered in this study to investigate the results 
when a combination of dissimilar approaches is applied. The premise behind this approach is the 
combination of different techniques would provide unique combinations of parameters for input 
into the NN. The combination may dramatically increase the classification rate over any single 
technique applied alone. It is the most labor intensive pre-processing technique applied in this 
Study, and requires a large, complex neural network. 

a No ALE Pre-processing Applied to the Data 

Figure 5-9 and Table 5.10 present the results obtained by combining the reduced-rank 
coefficients with wavelet - type parameters using the Coiflet 3 basis set. Redundant processing 
with different techniques should produce better results than either technique alone. This technique 


only slightly enhanced the classification rate of either alone. 


Distribution of Neural Net Output File 





Test File Number 


Figure 5-11 Distribution of neural network classifications obtained using the combination of 
reduced-rank AR coefficients and wavelet-type techniques; Coiflet 3 basis set; overall classification 
rate: 86.64%. 
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Table 5.12 Distribution of neural network classifications obtained using the combination of reduced- 
rank AR coefficients and wavelet-type parameters; Coiflet 3 basis set; overall classification rate: 
86.64%. 
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2. ALE Pre-processing Applied to the Data 

Figure 5-12 and Table 5.13 present the results obtained by first pre-processing the data 
using an ALE filter and next computing reduced-rank AR coefficients in combination with wavelet- 
type parameters. It produced an overall classification rate of 95.78 %. The results are impressive 


however, they came at the cost of pre-processing time and a very complex neural network. 


Distribution of Neural Net Output File 
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Figure 5-12 Distribution of neural network classifications obtained using the combination of ALE 
pre-processing, reduced-rank AR coefficients, and wavelet-type parameters; Coiflet 3 basis set; 
overall classification rate: 95.78 % 
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Table 5.13 Distribution of neural network classifications obtained using the combination of ALE 
pre-processing, reduced rank AR coefficients, and wavelet-type parameters; Coiflet 3 basis set; 
overall classification rate: 95.78%. 


VI. CONCLUSION AND RECOMMENDATIONS 


A. CONCLUSION 


This thesis investigated spectral based feature extraction techniques to input into a back- 
propagation neural network to resolve ocean biologic signals from those produced by earthquakes. 
The back-propagation neural network proved to be a very successful means of classifying the six 
categories of signals when used in conjunction with a good feature extraction technique. This 
thesis implemented a neural network classifier that can differentiate between earthquakes and five 
different species of whale with an overall classification rate exceeding 95 %. Specifically 
investigated were: 2 categories of feature extraction techniques, reduced-rank auto-regressive 
modeling and discrete wavelet transform based techniques. An adaptive line enhancer was 
investigated to improve the neural network results by removing uncorrelated noise. The specific 
results for the AR modeling, the discrete wavelet transform using Symmlet 8 and Coiflet 3 wavelet 
transforms, combining AR and DWT techniques, and finally the discrete wavelet transform using 
an A Trous method are summarized in the following paragraphs. 

The reduced-rank covariance method is used to produce AR models of the time domain 
signals. The choice of model number was determined systematically using three model order 
prediction techniques as discussed in Chapter III. The determination of usable singular values in 
the algorithm is determined visually, which proved to be only moderately successful. The 
combination of applying the ALE and using the reduced-rank covariance method produced results 
that were actually worse than using the reduced rank method alone. The most successful neural 
network implementation using this feature extraction technique had an overall classification rate of 
84.67 % as reported in Chapter V. 

Two implementations of the Wavelet Transform were considered; the Mallat’s algorithm, 
and the A-Trous algorithm. Mallat’s algorithm computes an orthonormal decomposition. This 
technique produced only moderately successful results on par with the reduced rank AR technique. 
The combination of applying the ALE and using the orthonormal wavelet based technique was 
somewhat more successful, however still not producing acceptable results. Investigated were 
Symmiet 8 and Coiflet 3 wavelet families. Without ALE pre-processing the neural network overall 
classification rates obtained were 84.67% and 78.05%. Using the ALE pre-processing, the results 


were brought up to 95.17% using the Symmlet 8 basis set. The Coiflet 3 basis set improved to 
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88.76%. Of note is the Symmlet 8 results are on par with the AR modeling technique with NN that 
are two thirds the size of using AR modeling. 

A combination of reduced-rank and orthonormal wavelet based techniques were 
investigated. The premise being that by combining the two techniques, a unique paring of the two 
methods would produce better results. The combination produced an overall classification rate of 
86.58%. Only a slight improvement was realized over either technique alone. This combination 
proved to be very successful when the ALE was also employed. An overall classification rate of 
95.78% was achieved. This level of processing for the feature extraction is very labor intensive, 
and involved using a large NN to classify the signals. 

The A-Trous based technique lead to the best performing technique considered in this 
study. This technique produced consistently highly successful results that did not need 
augmentation with a noise reduction technique. Two of the three highest classification rates were 
produced by neural networks implementing this technique. Three of the four NNs presented using 
this technique were above 95% in the overall classification rate. The technique that combined a 
high overall classification rate and a small NN is the A Trous with 4 voices per scale. The overall 
classification rate achieved was 96.41% with a network only slightly larger than that used for the 
AR modeling technique. Only one other extraction technique had a better overall classification rate, 
the A Trous with 6 voices per scale. The increase in resolution came at the expense of a very large 
NN and thus is not the optimal answer. 

The back-propagation neural network proved to be a very successful means of classifying 
the six categories of signals when used in conjunction with a good feature extraction technique. 
The high rate of success was achieved even though very little effort was made to optimize the 
neural networks for the data. Better results might be achieved by optimizing the neural network 
architectures for each feature extraction technique. Improvements may also be realized by 


employing a more sophisticated means of noise reduction. 
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B. RECOMMENDATIONS 


High classification rates obtained using the non-orthogonal wavelet based techniques are 
encouraging and warrant further study regarding the optimization of the NN implementation. 

The lack of success of the AR method may be indicative of the relative simplistic 
algorithm used in this study. ARMA modeling may provide a better solution while achieving a 
reduction in the order of the model. A potential mprovement of the reduced-rank covariance 
technique would be to automate self adjusting of the number of singular values chosen. The visual 
technique employed in this thesis was cumbersome and requires human intervention. 

In addition, an improvement in the de-noising algorithm may be achieved by employing a 
wavelet based de-noising technique suggested in [7]. The implementation of the ALE algonthm 
was hampered by applying the same technique to all signals. The application of a single technique 
did not provide the optimum de-noising solution to any of the signals, yet provided an engineering 
solution to de-noise the signals while not significantly degrading any single class of signal. 
Wavelet based techniques may enhance the ability to de-noise live ocean signals better than the 
ALE algorithm. 

Finally, improvement could be obtained by employing a neural network architecture that is 
capable of expanding to recognize new class of signals without having to be retrained on the 
signals it already recognizes. One such architecture that shows promise is the Fuzzy ARTMAP 


architecture mentioned in [10]. 
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APPENDIX 


ORDER.M 


function f[AIC,CAT,FPE,MDL, var ]=order(data,P) 

J ORDER uses the AIC, CAT, FPE, and MDL criteria to choose the 

% appropriate AR model order. Each output variable is a P-by-1 vector 

% with the quantities indicated. The index of the minimum of each quantity 
%o is the best theoretical model order. The variable "var" does not 

%. yield a minimum at the ideal model order; its usefulness 1s in examiming 
% the model order at which the variance becomes "flat." 


Jo 

% Usage: [AIC,CAT,FPE,MDL, var }=order(data,P) 

Jo 

% Input: data Sequence to be modeled 

YJ P Highest model order to test (all model orders from 
Yo 1 to P will be tested). 

Jo 

% Output: AIC AIC quantity (vector of length P) 

Jo CAT CAT quantity (vector of length P) 

% FPE FPE quantity (vector of length P) 

Jo MDL MDL quantity (vector of length P) 

Jo var Theoretical prediction error variance (vector of length P) 
Jo 


J ORDER calls the function BURG_A 


% Written by K. L. Frack Last Update: 16 March 1994 


Ns=length(data); 
me —zeros(P,1); 
CAT=zeros(P, 1); 
FPE=zeros(P, 1); 
MDL=zeros(P, 1); 
var=zeros(P, 1); 


for p=1:P; 
{a, var(p)]=burg_a(data,p); % Compute variance at each order 
AIC(p)=Ns*log(var(p))+2*p; % Compute AIC quantity 
SUM=0; 


for pp=1:p %\| Compute CAT quantity 
SUM=SUM+(Ns-pp)/Ns/var(pp); 
end 


CAT(p)=SUM/Ns-(Ns-p)/Ns/var(p); 

FPE(p)=var(p)*(Ns+p+1)/(Ns-p-1); % Compute FPE quantity 

MDL(p)=Ns*log(var(p))+p*log(Ns); % Compute MDL quantity 
end 


a7 


BURG_A.M 


function [a,var]= burg_a(data,P) 

% Burg's Method AR Model 

Jo 

% This function solves the AR model coefficients for the given data sequence 
% using Burg's method. 


Jo 

% Usage: [a,var] = burg_a(data,P) 

% 

% Input: data Data sequence 

Jo EB Desired model order 

% 

% Output: a Vector with "a" parameters ({1 al a2 ... aP]) 

Jo var Prediction error variance 

% Written by K. L. Frack Last Update: 15 March 1994 

[drow dcol]=size(data); 

if dcol > 1 %\| Ensures the data is a column vector 
data=data’; 

end 

N=length(data); 

ef=data(2:N); % Initial forward prediction error vector 

eb=data(1:N-1); % Initial backward prediction error vector 

a=[1]; % Initial “a" vector 

var=data'*data/N; % Initial error variance 

for k=1:P 
L=length(ef); 


gam(k)=(2*ef *eb)/(ef *ef+eb'*eb); %| Burg's algorithm performed for 
tef=ef(2:L)-conj(gam(k))*eb(2:L); %l each of the p iterations 
eb=eb(1:L-1)-gam(k)*ef(1:L-1); 


ef=tel: 
a=[a; 0]-conj(gam(k))*[0; flipud(conj(a))]; % update "a" vector 
var=var*(1-abs(gam(k))%2); % Update error variance 

end 

ARWHALE.M 


function [aw]=arwhale(q) 

% AR model of signal q using the reduced-rank covariance method 

% method reduces rank of estimared correlation matrix using the singular value decomposition 
% User visully picks number of singular values to keep program plots FFT ofsignal vs frequency responce 
% of AR model of segment 

% Returns matrix of a coefficients for model segments [25 X number of segments] 

% Written by LT R.C.Bennett, USN. 

% ref ar_covar.m by LT D.Brown USN 

% 

% last modified 9Oct94 

=o 

is=1: 

ANdex= 15 1272. 
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freq=findex*fs/5 12; 
Mo=25; %o%o%% model order 
%%% data is a multiple of 512 segments 


numseg=(floor(length(q)/512)); %%% number of segments 
disp({' the number of segments are ',snum2str(numseg)]) 
aw=zeros(Mo+1,numseg); 


for seg=1:numseg %% recursion to go through all 
%% segments of one "whistle" 
disp(['segment '.num2str(seg)]); 
Jo To To To To Fo To To To Fo Yo 
segment=q((seg*512)-511:(seg*512)); 
Jo To To Too Fo To To % % call svd_cov.m 
[a,b]=svd_cov(segment,Mo); 
[H, W]=freqz(b,a,512,'whole’); 


Par=abs(H); 
MagPar=20*log10(Par+eps); 
aw(:,Seg)= a; 


Jo To To % 
fg=abs(fft(segment,512)); 


SS=20*log10(fg+eps); 
%SS=SS(1:length(SS)/2+1); 
subplot(2, 1,2),plot(freq, MagPar(findex),'g',freq,SS(findex),'b’) 
title({Frequency Response of Model Segment ',num2str(seg),’ and Segment Spectral Content ']) 
xlabel(’ Sampling Frequency ') 
ylabel(Magnatude dB’) 


end %%%end of for loop 


SVD_COV.M 


function [ahat, bl]=svd_cov(x,Mo) 

%%% function ahat=svd_cov(x,Mo) 

%%% compute the AR coefficients of the data vector y 

%%% Mois the maximum model order. 

% This method combines the covariance method and the svd 

% method of rank reduction to model a segment of biological data. This method 
% reduces the noise by eliminating the singular 

% values assodiated with them 

% Written by LT R.C.Bennett. USN 

% last modified 9Oct94 


% ref AR_COVAR(x.P) by Dennis Brown. 
% ref Therrien,Discrete Random Signals And Statistical 
% Signal Processing, 1992, s 9.4 & 10 


% default returns 


a=[]:s=[J:R=[]:X=[]: 


ie 


% figure out if we have a vector 
if min(size(x))~=1, 

error(‘Input vector arg”x" must be an NX1 or an 1XN vector.'); 
end; %if 


%reshape vector to an Nx] 
x=x(:); 


ToT To To To Fo Vocom pute power in data 
pd= sum(x.*x); 

len=length(x); 

ToTo To Go %o% % Normalize energy 
x=x./sqrt(pd); 


To%o%o%o%o%o%o Pad the data vector out 
x=[x; zeros(Mo,1)]; 


To%o %o%o %o % % form the data matrix X 
X= zeros(length(x),Mo+1); 
t= length (x); 
for k=1:Mo+1 
DK =x: 
x=[x(t:t);x(1:t-1)]; 
end; %for 


To%o%o%o%o% take only non zero padded values 
X=X(Mo+ 1:length(x)-Mo,:); 


%%%I%% estimate correlation matrix 
R=, 
[m,n ]=size(R); 


ToTo To To To Fosolve using SVD version of normal equations 
%conjugate R 
R=flipud(fliplr(R)); 


% solve for a coefficeints 
Rx=R(2:m,2:n); 
p= -(R(2:m,1)); 
[(U,S, VjJ=svd(Rx); 
lamda=diag(S); 
figure(1), 
subplot(3,1,1), stem(amda,'b') %%o% want to see the singlar values 
%%% of the signal so to 
title({‘Singular Values of Segment ']) @%%separate the signal from the noise 


princip=ginput; %%% pick max number of singular values 
%o{princip ]=find(lamda > 0.5); 
%opc=max(princip) %%o% want to invert only principle coomponents 


pc=floor(princip(1.1)) 
%\amda=lamda-princip(1,2); %% Tsubtracts noise power 
To To To To To Po Fo Fo Yo % For psuedo inverse 


s= 1./lamda(1:pc); 
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Spi=diag(s); 


RXi= V(:,1:pc)*Spi*0¢, 1:pc)’; %o% 7% invert only principle components 
Jo To Fo To To To Fo Fo Go Fo % a coeficients 

a=RXi*p,; 

ev=(R(1,1)+sum(a.'.*R(1,2:n))); %%% error variance 

pev =ev/len; %%% prediction error variance 
ahat=[1;a]; %I%7% add a0 = 1 


To[o To To To To To% %o% generate model, compute power 
model = filter(1.ahat,[1;zeros(len-1,1)]); 
pm = sum(model.*model); 


Jo To ToT To To %0 Fo % % Compute gain (b0 in AR Model) Note This is a Fudge following D.Brown & prof 
Fargues 


b = sqrt(pd/pm); 
%%oTo%o To To Fo Vo%o % return as a row vector 
JY ahat = reshape(ahat,1,length(a)); 


%b = reshape(b, 1,length(b)); 


LMSALE.M 

function [w,y,e}=Imsale(x,M,mup,delay) 

%JLMSALE Adaptive least-mean square line enhancer. 
% [W,Y,E] = LMSALE(X,M,STEP,DELAY) implements 
% an adaptive line enhancer using the least-mean 

% squares approach where X is the input signal, 

% M is the number of filter weights, STEP is 

% the percentage of the maximum step size to use 

% in computing the next set of filter weights and 

% DELAY is the number samples to delay X in 

% computing the new weights. The final filter 

% weights are returned in W, the estimated signal 

% in Y and the error signal in E. 

% 

% The maximum step size is computed as 

% 

% maxstep = 2/(M * std(x)*2); 

% 

% 

Jo [W,Y,E] = LMSALE(X,WIN,STEP,DELAY) uses the 
% vector WIN as the initial guess at the weights. 

% The number of weights is equal to the length 

% of WIN. 


Jo LT Dennis W. Brown 3-10-93 

J Naval Postgraduate School, Monterey, CA 
% May be freely distributed. 

% Not for use in commercial products. 


% default returns 
y=[J;w=[]:e=[]; 
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% check number of input args 
if nargin ~= 4 

error(Imsale: Invalid number of input arguments...'); 
end 


% figure out if we have a vector 
if min(size(x)) ~= 1, 

error(‘Imsale: Input arg "x" must be a 1XN or Nx] vector."); 
end; 


% work with Nx] vectors 
x= xp: 


% initial weights 
w0 = zeros(M,]1); 


if nargin < 5 
pl = 0; 
end; 


% some parameters 


Ns=len gth(x); 
runs = std(x)‘2; 


% compute maximum Step size 
mumax = 2/(M * runs); 


% make mu less than 2/lamdamax using percentage 
mu = mup/100 * mumax; 


% start with initial guess for weights equal to the null vector 
w = wQ; 


% recursively compute the weights to three significant figures 
y=zeros(Ns, 1); % space for filtered signal 
e=zeros(Ns, 1); % space for error signal 
x1 = [zeros(M-1,1) ; x(1:M+delay, 1)]; 


% initial conditions set to zero 
for k=delay+1:M+delay- 1 


b = flipud(xi((k-M+1:k)-delay+M-1)); 


% compute filter output 
y(k) = w' * b; 


% compute error 
e(k) = x(k) - y(k); 


% compute new weights 
W=W+ mu * b * e(k); 


end 


% rest of data 
for k=M+delay:Ns 


b = flipud(x((k-M+1:k)-delay)); 


% compute filter output 
y(k) = w' * b; 


% compute error 
e(k) = x(k) - y(k); 


% compute new weights 
W = Wt mu * b * e(k); 


end 


ECOEFF2.M 


function [Ew,Ell] = Ecoeff2(signal) 

% Ecoeff2.m generates the energy per scale of the wavelet decomposed signal 

% using Coiflet,3 orthogonal wavelet transformation This program is also used compare wavelet 

% coefficients by changing the Wave_Type parameter. 

% Written by LT R. C. Bennett 

% last updated 10/16/94 

% calls Wave_Type.m, MakeONFilter.m and FWT_PO.m from Wavelab toolbox 

% 

Wave_Type = 'Coiflet’; par=3; 

signal =signal/std(signal); % normalizes the energy of the signal to 1.0 

qmf= MakeONFilter(Wave_Type,par); % calls TeachWave MakeOnFilter to build 
% the QMF Filter Bank 

Ew=[];El={]; 

L=2; % course level 


wsig=FWT_PO(signal,L,qmf); % Transforms signal returns vector of all 
% wavelet high pass coefficents 
{n,JJ=dyadlen gth(wsig); % dyad is an octave index power of two 
for j = J-1:-1:L 


% average energy in high pass coefficients 
~ Ew=(Ew,sum(wsig(dyad(j)).*2)/length(wsig(dyad(j)))]; 


% Inverse transform for low pass coefficents 
l(:.J-j)=zeros(n, 1); 
l(dyad(j),J-j)=wsig(dyad(j))’; clear temp 
temp=IWT_PO(I(:,J-j).j.qmf); 
ll(:,J-j)=temp; % identify low-pass coefs 
Ell=[Ell.sum(temp.*2)/length(temp)]: % lp coefs energy per scale 
oj 


end % for j 
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INWVLET.M 


function [Ew, Ell] = InWvlet(signal) 

%I%oInWviet.m 

% function to segment data into lengths of 512 samples, 
% and call Ecoeff.m to get the energy per wavelet scale 
% Written by LT R.C. Bennett 

% last updated 16O0ct94 

x=signal; 

numseg=floor(length(signal)/512); %I%% number of segments 
disp({' Number of segments are ',num2str(numseg)]) 
Ew=zeros(7,numseg); 

Ell=zeros(7,numseg); 


for seg= 1:numseg % %recursion to call Ecoeff for all segments of signal 
disp([‘segment ‘, num2str(seg)]) 

ml —x((see7 5 12)-511:(seg*512)) ; 

JoJo% %% call Ecoeff2.m 

% [ew,ellJ=Ecoeff(x11); 

[ew, ell ]=Ecoeff2(x11); 

ew=ew '; 

ell=ell; 
Ew(:,seg)=ew; 

EIl(:,seg)=ell; 

end %7% for seg loop 
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SHENDWT2.M 


Jo This MATLAB function calculates an approximation of the 
% DISCRETE WAVELET TRANSFORM of a function 

Jo using SHENSA’S ATROUS ALGORITHM discribed in "The 
% Discrete Wavelet Transform: Wedding the A Trous and 

% Mallat Algorithms," IEEE Transactions on Signal 

% Processing, Vol. 40, No. 10, Oct. 1992. 

% 

% The function syntax is: 

J [W,beta,nuJ=shendwt2(s, M, P_i, P_f,nc) 

J where: "s" represents a vector of data to be transformed 


% "M" is an integer indicating the number of "voices" 
% tu be used to cover the frequency spectrum 

% “W" is an array containing the coefficients of 

% the magnitude-squared wavelet transform 

% of "s." 

% "P_i" represents the first wavelet transform scale 

% to be calculated 

Jo "P_f" represents the final wavelet transform scale 


% The transform is calculated using a madulated Gaussian 

Jo window as an analyzing wavelet. If multiple voices are 

% specified, the projection of "s" on each voice will be 

% represented separately. The Lagrange, a trous interpolation 
% filter used is obtained from convolving a four-point 

% Daubechies scaling filter with itself. 

% Written by N.A. Hamlett, 1993 [16] 

% Modified by M.P. Fargues 1994 

function [W,beta,nu]=shendwt2(s, M. P_i, P_f,nc) 


J 
% First, the argument vector "s" is conditioned. If"s” is 
% defined as a column vector, it is converted to a row vector. 
% Secondly, "s" is zeropadded to the next integer power of "2." 
% 
J%iopt=input(‘option to use log scale: y/n 1/0'); 

iopt=0; 


[rows, cols ]=size(s); 
% 
if cols == 
Se 
end 
clear rows;clear cols; 
J 
s=zeropad(s, 1,2ceil(log(length(s))/log(2))); 
% 
% Default values are imposed for starting and ending scales if 
J none are specified. 
% 
if exist(P_i}) == 0 


Pasi: 
end 
iL Cxist( Ef) == 
P_f=floor(log(length(s))/log(2)); 
end 
Jo 
% If the number of voices is not specified, a default value of 
% M=2 is imposed. 


% 

if exist(M') == 0 
M=Z. 

end 

% 


Jo Next the analyzing wavelet must be calculated. The starting 
Jo point of this process is to specify the Gaussian window rolloff 
% factor "beta" in accordance with the specified number of 
% voices. (Shensa (6.31)). If M=1, the value of "beta" is defined 
% as "pi/(4*sqrt(2))." 
% 
%fopt=input(‘automatic selection of beta and k: y/n (1/0): '); 
fopt=0; 
if fopt==0 
% beta=input(‘beta: '); 
beta= .15; 
% nu=input(‘nu: '); 
nu= .85; 
if nu==0,nu=pi-sqrt(2)*beta;end 


else 
if vi == 
% 
beta=pi/(4*sqrt(2)); 


% else 


% beta=1/(2*M); 
% The location of the center frequency "nu" is assigned in accordance 
%. with Shensa (6.27). 
% 
end 
Jonu=pi-sqrt(2)*beta; 
end 
% 
J The voice scaling factor "a" is calculated using Shensa (6.32). 
Jo 
a=24(1/M); 
Jo 
Jo The region of support for the analyzing wavelet filter "g" is 
% approximated as the region for which the Gaussian window 
% is greater than 10*-3. Consequently, the filter impulse 
% response domain is {-a*(M-1)*sqrt(14)/beta, a*(M-1)*sqrt(14)/beta]. 
Jo The length of "g" is represented by an odd integer. 
% 
n=-ceil(a*(M-1)*sqrt(14)/beta):ceil(a*(M-1)*sqrt(14)/beta); 
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% The analyzing wavelet is calculated for each voice in accordance 

J _ with Shensa (6.32). It is then normalized such that its peak 

% passband frequency response is unity. 

Jo 

for k=1:M 

J 
o(k,:)=exp(j*nu*(n/(a*(k-1)))).*exp(-(beta*2* (n/a*(k-1)).%2)/2); 

Jo g(k,:)=g(k,:)/max(abs(freqz(g(k.:),1,512))); 

end 

clear n; 

Jo 

% A variable "G" indicating the half-length of filter "g" is 

J defined. 

G=ceil(length(g)/2) 

% 

Jo 

Jo Next, the Lagrange interpolation filter is calculated. The filter 

Jo is obtained from "auto-convolution" of a Daubechie four-point 

% DWT filter. 

Jo 

f=[1+sqrt(3) 3+sqrt(3) 3-sqrt(3) 1-sqrt(3)]/(4*sqrt(2)); 

% 

f=conv(f,flipIr(f)); 

f=f/sqrt(2); 

Jo 

% A variable "F" indicating the half-length of "f" is defined. 

Jo 

F=ceil(length(f)/2); 

Jo 

Jo The recursion is next executed according to Shensa (2.12a & b). 

% The sequence is filtered and decimated the first "P_i" times. 

% 


for k=1:P_1-1 
Jo 
s=conv(s,f); 
% 
s=s(F:2:length(s)); 
Jo 
end 
Yo 


J The output matrix "W" is initialized as a zero vector of 
Jo dimensions identical to those of "s." 
% 
W=zeros(size(s)); 
Jo 
for k=P_i:P_f 
% 
Jo The data vector "s" is first filtered with each voice of "g." 
% The squared magnitude of the result is assigned to the appropriate 
% column of “W." 
J 
for n=(k-P_i)*M+1:(k-P_i+1)*M 
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% 
%G The row of "W" to be evaluated is initialized as zero. 
W(n,:)=zeros(size(W(1,:))); 


J The magnitude of the filter output is calculated 
% and assigned to "Wk." 


Wk=abs(conv(s, g(n-(k-P_i)*M,:))); 
Wk_ph= conv(s, g(n-(k-P_i)*M,:)); 
% 
% The elements of Wk" are assigned to the corresponding 
% columns of the row of "W" being calculated. 
% 
JoW (n,2(k-P_i):2(k-P_i):len gth(W))=Wk(G:len gth(Wk)-G+1); 
if 1opt==0 
W(n, 1:24(k-P_i):length(W))=Wk(G:length(Wk)-G+ 1); 
else 
for k1=G:length(Wk) 
if Wk(k1)<eps, Wk(k1)=-100; 
else Wk(k1)=10*log(Wk(k1)); end 


end 
W(n, 1:24(k-P_i):length(W))=Wk(G:length(Wk)-G+1); 
end 
Jo 
end 
% 
% 
% "s" is filtered with the lagrange interpolation filter and then 
% decimated. 
Jo 
s=conv(s.f); 
Jo 
s=s(F:2:length(s)-F); 
end 
Yo 
% plot 
% figure, 


[mw,nw ]=size(W); 
%mesh(rot90(abs(W ))) 

% ylabel (‘time’) 

% xlabel(‘octave’) 

figure 

contour(W,. nc, 1:nw,(0:mw-1)/M) 
s=['DWT a-trous ‘.num2str(M),' voices’ ]; 
% title(s); 

xlabel(‘time’);ylabel (‘octave’) 


LOADW3V7t.M 


% Loads feature extracted data , in this instance the data from A Trous 7 voices per scale data. 
% was used in all cases to load data, append the target data and write in NeuralWorks 
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% “nna” format. This version losds the testing file, the training file used same format exceptloaded the 
% first 86filesfor each classification of signal. 

Jo. loadw3v7t.m 

% Written by LT R.C. Bennett, 

% &8DEC94 

clear % load feature extracted data matrix 
load ew3v7.dat; 

load gw3v7.dat; 

load hw3v7.dat; 

load kw3v7.dat; 

load sw3v7.dat; 

load pw3v7.dat; 


e=ew3v7(:,87:137); 
g=gw3v7(:,87:137); Zoscale each input to be same size per whale 
k=kw3v7(:,87:137); 
=sw3v7(:,87:137); 
p=pw3v7(:,87:137); 
h=hw3v7(:,87:137); 
ae=size(e); 
ag=size(g); 
ah=size(h); 
ak=size(k); 
as=size(S); 
ap=size(p); 
whales=[s kh g pe]'; 
% Targets %o% appends the known classification to the end of the train/test file 
tar getl=zeros(6,as(1,2)); 
target] (1,:)=ones(1,as(1,2)); 
tar get2=zeros(6,ak(1,2)); 
tar get2(2,:)=ones(1,ak(1,2)); 
target3=zeros(6,ah(1,2)); 
tar get3(3,:)=ones(1,ah(1,2)); 
target4=zeros(6,ag(1,2)); 
target4(4,:)=ones(1,ag(1,2)); 
tar get5=zeros(6.ap(1,2)); 
tar get5(5,:)=ones(1,ap(1,2)); 
tar get6=zeros(6,ae(1,2)); 
tar get6(6,:)=ones(1,ae(1,2)); 
targets=[target] target2 target3 target ... 
targetS target6]; 
tar gets=targets ; 
x=[whales targets]; 
To To To To To To To To Ge Vo Vo [oe %e rewrite in .nna format 
ee: 
{nlines, ncols]=size(x); 
N_lines=floor(nlines/6); 
N_end=nlines-N_lines*6; 


%o%o%% To write .nna file 
fid=fopen(‘w3v7t.nna’,'w’'); 
for kp= I:ncols 
% first line 
fprintf(fid.” 211.8f %11.8f %11.8f. x(1.kp), x(2.kp). x(3.kp)) 
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fprintf(fid,’ %11.8f %11.8f %11.8f\n\r', x(4,kp), x(5,kp), x(6,kp)) 


for kl=1:N_lines-1 %lines 2 to end 
fprintf(fid,'& %11.8f %11.8f %11.8f,x(6*kl+1,kp),x(6*kl+2,kp),x(6*ki+3,kp)) 
fprintf(fid,’ %11.8f %11.8f %11.8f\n\r',x(6*k1+4,kp),x(6*ki+5,kp),x(6*kI+6,kp)) 
end %for k} 
if N_end ~=0 % complete for the last line 
k]=N_lines; 
if N_end >= 1, fprintf(fid,,& %11.8f,x(6*kl+1,kp));end 
if N_end >= 2, fprintf(fid,' %11.8f,x(6*kl+2,kp));end 
if N_end >= 3, fprintf(fid,’ %11.8f,x(6*ki+3,kp));end 
if N_end >= 4, fprintf(fid,' %11.8f,x(6*kl+4,kp));end 
if N_end >= 5, fprintf(fid,' %11.8f,x(6*kI+5,kp));end 
end %if 
fprintf(fid, '\n\r'); 
end %for kp 
fclose(fid) 
lmwrite w3v7t.nna a: 
% 'eject 


SPECT.M 


% function [GG,X]=spect(N_voic,Max_sc,beta,nu_pi) 
% N_voic: number of voices 
% Max_sc: maximum scale 
% beta and nu(in fraction of pi) as defined in Shensa 
J Written by M.P. Fargues 1993 
function [GG,X]=spect(N_voic,Max_sc,beta.nu_pi) 
%clyg 
ip=input(‘plot? :’); 
M=N_voic;Nmax=Max_sc;nu=nu_pi*pi; 
clear GGG 
nsamp=100;nover=50; 
nsampt=nsamp+2*nover; 
Ntot=Nmax*nsampt; NS=M*Nmax; 
for 1i0=1:Nmax 
i2=2/(10-1); 
for i=-nover:nsamp+nover- 1 
1i=1+nover+1; 
omega=pi/(2%10) + (pi/(2%10))/nsamp*(1); 
for k=1:M 
kk=k-1; 
temp=exp(-(12*2*(kk/M)*omega -nu)*2/(2*beta’2) ); 
ksamp=(10-1)*M+k; 
G(NS+1-ksamp. Ntot-i0*nsampt+ii)=12* sqrt(2*pi)*2’ (kk/(M))*temp/beta; 
end 
end 
end 


1=0; 
for k=1:Nmax-1 
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for k2=1:M 
temp=dec( G(M*(k-1)+k2,nsampt*(k-1)+1:nsampt*k),2“(Nmax-k)); 
temp2=G(M*(k-1)+k2,nsampt*(k-1)+1:nsampt*k); 
nl=length(temp); 
if k==1, GG(k2,1:nl)=temp; 10=(nsamp+nover)/(2“(Nmax-k)); 
else 
l=1l-nover/(2“(Nmax-k)); 
GG( M*(k-1)+k2,11+1:1l+nl=temp;l0=nsamp/(2“(Nmax-k)); 
end 
end 
l=1+10; 
end 
nl=length(G(M*(Nmax-1)+1:M*Nmax,nsampt*(Nmax-1)+1:nsampt*Nmax-1)); 
li=l-nover; 
GG(M*(Nmax-1)+1:M*Nmax, I+ 1:l+nl)=G(M*(Nmax-1)+1:M*Nmax,nsampt*(Nmax- 
1)+1:nsampt*Nmax-1); 


naxis=length(GG); 

% x1=2%(-(Nmax-1));x2=1+(nover-1)/(2*nsamp); 
x1=2*(-(Nmax))*(1-(nover/nsamp));x2=1+(nover-1)/(2*nsamp); 
for q=1:naxis 

X(q)=x 1+(q-1)*(x2-x1)/naxis; 

end 


X=X/2; 

%subplot(211),plot(G') 

if ip==1 

Z%subplot(212) 

plot(X,GG’') 

uide({'A Trous Impl.:’;num2str(M),’ voice(s) - beta:'snum2str(beta),’ nu:’,num2str(nu_pi), pi'}) 
xlabel(normalized frequency’) 

ylabel(‘magnitude’) 

end 


CALLOUT.M 


% callout.m 

% Written by LT R.C. Bennett 

% 8DEC94 

% loads NN output files for display and 

% stauisics for Chapt 5 for mean and std of output nodes of NN. 
% calls oumnr.m 

clear 

load c:\nw2v50\w3v7t.nnr % non-orthogonal 7 voices 
load c:\nw2v50\w3v4t.nnr 9% non-orthogonal 4 voices 
load c:\nw2v50\w3vSt.nnr % non-orthogonal 5 voices 
load c:\nw2v50\w3v6tnnr % non-orthogonal 6 voices 


load c:\nw2v50\tar.nnr % AR model coefficients alone 


load c:\nw2v50\ttle_S5.nnr % ALE AR models 
load c:\nw2v50\wvl2tle.nnr % ALE, Wavelet, Coiflet 3 


ol 


load c:\nw2v50\wvlettle.nnr % ALE, Wavelet, Symmlet 8 
load c:\nw2v50\wvlettst.nnr % Wavelet alone. Symmlet 8 

load c:\nw2v50\wvlw2tst.nnr % Wavelet alone. Coiflet 3 

load c:\nw2v50\bothawts.nnr % AR & Wavelet, Coiflet 3 

load c:\nw2v50\btstw2.nnr % ALE, AR, & Wavelet, Coiflet 3 


figure %7 Voices Per Octave 

waterfal(w3v7t(:, 7:12)'), tithe( Distribution of Neural Net Output File’), 
ylabel('Class’) 

xlabel(‘Test File Number’), 

axis({O max(size(w3v7t)) 0 6 min(@min(w3v7t)) max(max(w3v/7t))]), view({23,82]) 
%[mn,sd]=outnnr(w3v/t) 

print 7voice -deps 

%figure,meshc(mn(:,7:12)), utle’?Mean of w3v7t’) 


figure %4 Voices Per Octave 

waterfal(w3v4t(:, 7:12)'), itle(Distribution of Neural Net Output File’), 

ylabel (‘Class’) 

xlabel('Test File Number’) 

axis({O max(size(w3v4t)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view(({23,82]) 
%[mn.sd]=outnnr(w3v4t) 

print 4voice -deps 

% figure,meshc(mn(:,7:12)), title” Mean of w3v7t’) 


figure%5 Voices Per Octave 

waterfal(w3v5t(:, 7:12)'), itle(Distribution of Neural Net Output File'),7~2%%% 
xlabel(‘Test File Number’) 

ylabel(‘Class’), 

axis({O max(size(w3v5t)) 0 6 min(min(w3v5t)) max(max(w3v5t))]}), view({23,82]) 
%[mn,sd]=outnnr(w3vot) 

print 5voice -deps 

%figure,meshc(mn(:, 7:12)), tithe Mean of w3v7t’) 


figure%6 Voices Per Octave 

waterfal(w3v6t(:, 7:12)'), title( Distribution of Neural Net Output File’), 
xlabel('Test File Number’) 

ylabel(‘Class’), 

axis({O max(size(w3v6t)) 0 6 min(min(w3v7t)) max(max(w3v7t))]). view({23.82]) 
%{mn,sd]=outnnr(w3 vot) 

%figure,meshc(mn(:,7:12)), title(Mean of w3v7t') 

print 6voice -deps 


figure %tar AR Coefficients 

waterfal(tar(:, 7:12)'), title(Distribution of Neural Net Output File’), 
xlabel('Test File Number’) 

ylabel(‘Class’) 

axis((O max(size(tar)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view({23,82]) 
%{mn,sd]=outnnr(tar) 

print tar -deps 

% figure,meshc(mn(:,7:12)), title’? Mean of w3v7t’) 


figure % ttle_5 ALE AR Coefficients 
waterfal(ttle_5(:, 7:12)'), tittle(Distribution of Neural Net Output File’). 
xlabel(‘Test File Number’) 


ylabel (‘Classification’) 

axis({O max(size(ttle_S)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view([23,82]) 
%{mn,sd]=outnnr(ttle_5) 

print ttle_5 -deps 

%figure,meshc(mn(:,7:12)), utle(Mean of w3v7t') 


figure % wvlettst Wavelet (Symmlet 8) Energy per octave 

waterfal(wvlettst(:, 7:12)'), title(Distribution of Neural Net Output File’), 
xlabel(’Test File Number’) 

ylabel(‘Class’) 

axis({O max(size(wvlettst)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view([23,82]) 
% {mn ,sdJ=outnnr3(wviettst) 

print wvlet -deps 

%figure,meshc(mn(:,7:12)), title(“Mean of w3v/t’) 


figure % wvlw2tst Wavelet (Coiflet 3) Energy per octave 

waterfal(wvlw2tst(:, 7:12)'), title(Distribution of Neural Net Output File’), 
xlabel('Test File Number’) 

ylabel('Class’) 

axis({O max(size(wvlw2tst)) 0 6 min(@min(w3v7t)) max(max(w3v7t))]), view([23,82]) 
[mn,sd]=outnnr3(wvlw2tst) 

print wvletC3 -deps 

%figure,meshc(mn(:,7:12)), utle(“Mean of w3v7t') 


figure % wvlettle ALE Wavelet (Symmlet 8) 

waterfal(wvl2tle(:, 7:12)'), title(Distribution of Neural Net Output File '), 
xlabel(‘Test File Number’) 

ylabel(‘Class’) 

axis({O max(size(wvl2tle)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view({23,82]) 
%[mn,sd]J=outnnr3(wvlettle) 

print wvls8ale -deps 

%figure,meshc(mn(:,7:12)), title’?Mean of w3v7t’) 


figure % wvl2tle ALE Wavelet (Coiflet 3) 

waterfal(wvl2tle(:, 7:12)'), title(‘Distribution of Neural Net Output File '), 
xlabel(’Test File Number’) 

ylabel (‘Class’) 

axis({O max(size(wvl2tle)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view([23,82]) 
%[(mn,sd J=outnnr3(wvl2tle) 

print wvlC3ale -deps 

%figure,meshc(mn(:,7:12)), utle(¢Mean of w3v7t') 


figure % bothawts AR & Wavelet (Coiflet 3) 

waterfal(bothawts(:, 7:12)'). itle( Distribution of Neural Net Output File’), 
xlabel(‘Test File Number’) 

ylabel(‘Class’) 

axis({0 max(size(bothawts)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view([23,82]) 
%[mn,sd ]=outnnr3(bothawts) . 

print barwvC3 -deps 

%figure,meshc(mn(:,7:12)), utle(Mean of w3v7t’) 


figure % btstw2 ALE AR & Wavelet (Coiflet 3) 


waterfal(btstw2(:, 7:12)'), utle( Distribution of Neural Net Output File’), 
xlabel(‘Test File Number’) 


os 


ylabel(‘Class’) 

axis([O max(size(btstw2)) 0 6 min(min(w3v7t)) max(max(w3v7t))]), view({23,82]) 
%(mn,sd]=outnnr3(btstw2) 

print abarwvC3 -deps 

%figure,meshc(mn(:,7:12)), utle’Mean of w3v7t) 


OUTNNRA.M 


Zoumnr.m 

% For calculating the mean and std of output from nnr format 

J for all A Trous non-orthogonal wavelet energy per voice per octave coefficients 
% Written by R.C.Bennett 

% ADEC94 

function [mn,sdJ=outnnr(nnr) 


sperm=nnr(1:51,:); 

kilier=nnr(52:102,:); 

humpback=nnr(103:153,:); 

gray=nnr(154:204,:); 

pilot=nnr(205:255,:); 

earth=nnr(256:306,:); 

msperm=mean(sperm); 

sdsperm=std(sperm); 

mkiller=mean (killer); 

sdkiller=std(killer); 

mhump=mean(humpback); 

sdhump=std(humpback); 

mgray=mean(gray); 

sdgray=std( gray); 

mpilot=mean(pilot); 

sdpilot=std(pilot); 

mearth=mean(earth); 

sdearth=std(earth); 

mn=[msperm; mkiller; mhump; mgray; mpilot; mearth]; 
sd={sdsperm; sdkiller; sdhump; sdgray; sdpilot; sdearth]}; 
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